您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (2): 1-10.doi: 10.6040/j.issn.1672-3961.0.2022.136

• 机器学习与数据挖掘 •    下一篇

一种基于改进ReliefF算法的入侵检测模型

刘财辉(),周琪*(),叶晓文   

  1. 赣南师范大学数学与计算机科学学院,江西 赣州 341000
  • 收稿日期:2022-04-11 出版日期:2023-04-22 发布日期:2023-04-21
  • 通讯作者: 周琪 E-mail:liu_caihui@163.com;1203302314@qq.com
  • 作者简介:刘财辉(1979—),男,江西于都人,教授,博士,主要研究方向为粒计算、机器学习与数据挖掘。E-mail: liu_caihui@163.com

An intrusion detection model based on improved ReliefF algorithm

Caihui LIU(),Qi ZHOU*(),Xiaowen YE   

  1. School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, Jiangxi, China
  • Received:2022-04-11 Online:2023-04-22 Published:2023-04-21
  • Contact: Qi ZHOU E-mail:liu_caihui@163.com;1203302314@qq.com

摘要:

针对现有入侵检测算法中特征提取不充分、未考虑特征权重的影响、模型分类不够精确等问题,提出一种基于改进ReliefF算法的入侵检测模型。通过优化入侵数据特征权重计算,提出改进的ReliefF算法;根据计算特征的Pearson相关系数,建立特征相关性量表。只保留其中一个相关性高的特征,以实现特征的二次优化;对最优特征子集分别使用决策树(decision tree,DT)、k-最近邻(k-nearest neighbor,KNN)、随机森林(random forest,RF)、朴素贝叶斯(naive bayes,NB)和支持向量机(support vector machine,SVM)5种分类器评价该方法的分类性能和准确性。在NSL-KDD和UNSW-NB15两个数据集上的试验结果表明,该方法不仅具有较好的检测性能,还能有效降低特征维度,对分类器的计算复杂度有积极的影响。

关键词: ReliefF算法, 权重优化, 特征选择, 入侵检测, 分类

Abstract:

Aiming at the problems of insufficient feature extraction in the existing intrusion detection algorithms, the influence of feature weights was not considered, and the model classification was not accurate enough, an intrusion detection model based on the improved ReliefF algorithm was proposed. By optimizing the calculation of the feature weight of the intrusion data, an improved algorithm of ReliefF was proposed, based on the Pearson correlation coefficient of the calculated feature, a feature correlation scale was established. Only one of the features with high correlation was retained to realize the secondary optimization of the features, and finally decision tree, k-nearest neighbor, random forest, naive bayes and support vector machine classifier were used to evaluate the classification performance and accuracy. Experimental results on NSL-KDD and UNSW-NB15 data sets showed that this method could not only effectively reduce the feature dimension, but also had better detection performance, which had a positive effect on the computational complexity of the classifier.

Key words: ReliefF algorithm, weight optimization, feature selection, intrusion detection, classification

中图分类号: 

  • TP3-0

图1

基于机器学习的入侵检测模型"

图2

基于改进ReliefF算法的入侵检测模型"

表1

NSL-KDD数据集的数据分布"

类别 样本数量 百分比/%
Nomal 77 054 51.88
Dos 53 385 35.95
Probe 14 077 9.48
U2R 252 0.17
R2L 3 749 2.52
合计 148 517 100.00

表2

NSL-KDD数据集样本选取"

攻击类型 Tr训练集样本个数 Ts测试集样本个数
Nomal 67 343 9 711
Dos 45 927 7 458
Probe 11 656 2 421
U2R 52 200
R2L 995 2 754
合计 125 973 22 544

图3

数据预处理流程图"

表3

数据集中10%和20%数据量的特征权重排序结果"

特征名称 特征权重
10%数据集 20%数据集
dst_host_serror_rate 0.496 5 0.501 0
logged_in 0.457 2 0.438 8
serror_rate 0.428 9 0.424 3
srv_serror_rate 0.373 1 0.382 7
same_srv_rate 0.273 1 0.271 0
dst_host_srv_serror_rate 0.210 4 0.213 6
dst_host_same_srv_rate 0.182 4 0.177 3
dst_host_srv_count 0.145 4 0.139 4
protocol_type 0.131 1 0.135 3
dst_host_count 0.124 0 0.131 2
flag 0.089 6 0.089 4
service 0.085 7 0.086 8
dst_host_same_src_port_rate 0.072 6 0.087 4
count 0.045 8 0.041 7
dst_host_rerror_rate 0.034 8 0.032 5
srv_diff_host_rate 0.025 8 0.032 5
dst_host_diff_srv_rate 0.022 0 0.024 3
dst_host_srv_rerror_rate 0.020 6 0.017 0
rerror_rate 0.016 5 0.018 2
srv_count 0.012 6 0.011 5
is_guest_login 0.010 6 0.011 7
diff_srv_rate 0.010 0 0.010 1
srv_rerror_rate 0.010 0 0.012 0
wrong_fragment 0.007 1 0.006 7
dst_host_srv_diff_host_rate 0.002 5 0.004 2
duration 0.001 9 0.003 0
root_shell 0.001 6 0.000 8
su_attempted 0.000 8 0.000 0
hot 0.000 7 0.000 8
num_shells 0.000 4 0.000 0
num_failed_logins 0.000 0 0.000 0
num_access_files 0.000 0 0.000 0
src_bytes 0.000 0 0.000 0
dst_bytes 0.000 0 0.000 0
land 0.000 0 0.000 0
urgent 0.000 0 0.000 0
num_compromised 0.000 0 0.000 0
num_root 0.000 0 0.000 0
num_file_creations 0.000 0 0.000 0
is_hot_login 0.000 0 0.000 0

表4

特征子集个数"

处理前的特征子集个数 M-ReliefF处理后的特征子集个数 二次优化后的特征子集个数
41 20 20

图4

相关性热力图"

表5

分类样本参数"

样本分类 判断为攻击 判断为正常
攻击样本 TP FN
正常样本 FP TN

表6

不同分类算法的性能指标对比"

数据集名称 分类算法评估准则
ACC/% RP/% F1 /% RD/% RFP/% 算法耗时/s
NSL-KDDDT 98.15 98.45 98.19 97.94 1.62 0.246 6
MR-DT 98.01 98.32 98.05 97.79 1.75 0.172 4
KNN 98.79 98.92 98.81 98.71 1.12 40.345 2
MR-KNN 97.89 97.89 97.94 97.70 1.90 3.193 4
RF 99.44 99.40 99.45 99.50 0.62 5.879 6
MR-RF 96.06 94.28 96.24 98.28 1.49 3.489 3
NB 87.02 84.19 87.89 91.93 18.14 0.210 5
MR-NB 85.13 82.39 86.16 90.29 20.28 0.052 1
SVM 96.70 94.32 96.86 99.55 6.29 2 591.549 7
MR-SVM 95.64 93.58 95.84 98.22 6.63 70.291 3
UNSW-NB15DT 95.03 96.32 96.35 96.38 7.80 2.519 6
MR-DT 94.96 95.83 96.33 96.84 9.09 1.033 7
KNN 93.82 94.47 95.50 96.56 11.99 126.400 7
MR-KNN 94.22 95.45 95.77 96.09 9.75 59.895 2
RF 95.91 96.19 97.02 97.86 0.62 24.542 2
MR-RF 93.71 91.54 95.58 99.99 19.68 10.384 6
NB 87.40 88.74 90.96 93.29 25.12 0.221 6
MR-NB 85.68 84.06 90.30 97.55 39.95 0.118 2
SVM 93.69 91.83 95.54 99.58 18.81 683.028 1
MR-SVM 93.94 92.12 95.74 99.65 18.39 434.906 3

图5

NSL-KDD数据集中不同分类算法的ROC曲线对比图"

图6

UNSW-NB15数据集中不同分类算法的ROC曲线对比图"

表7

本研究方法与其他模型的效果比较"

数据集名称 方法模型 ACC RD RFP
NSL-KDD本研究方法 98.01 97.79 1.75
SSL-3WD[37] 96.10 97.70 2.40
CNN-TWD[38] 96.10 92.30 2.00
Simple-TNN[39] 97.60 98.90 NAN
UNSW-NB15本研究方法 94.96 96.84 9.09
SSL-3WD[37] 94.70 96.30 3.20
ReliefF-P[40] NAN 87.39 15.62
1 SULTANA N , CHILAMKURTI N , PENG W , et al. Survey on SDN based network intrusion detection system using machine learning approaches[J]. Peer-to-Peer Networking and Applications, 2019, 12 (2): 493- 501.
doi: 10.1007/s12083-017-0630-0
2 SVENMARCK P, LUOTSINEN L, NILSSON M, et al. Possibilities and challenges for artificial intelligence in military applications[C]//Proceedings of the NATO Big Data and Artificial Intelligence for Military Decision Making Specialists' Meeting. Bordeaux, France: Computer Science, 2018: 1-16.
3 STAMPAR M, FERTALJ K. Artificial intelligence in network intrusion detection[C]//Proceedings of 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). Opatija, Croatia: IEEE, 2015: 1318-1323.
4 LEE W, STOLFO S J, CHAN P K, et al. Real time data mining-based intrusion detection[C]//Proceedings of DARPA Information Survivability Conference and Exposition Ⅱ(DISCEX'01). Anaheim, USA: IEEE, 2001: 89-100.
5 KUMAR G , KUMAR K , SACHDEVA M . The use of artificial intelligence based techniques for intrusion detection: a review[J]. Artificial Intelligence Review, 2010, 34 (4): 369- 387.
doi: 10.1007/s10462-010-9179-5
6 MEHDI S A, KHALID J, KHAYAM S A. Revisiting traffic anomaly detection using software defined networking[C]//Proceedings of International Workshop on Recent Advances in Intrusion Detection. Heidelberg, Germany: Springer, 2011: 161-180.
7 LAZAREVIC A, ERTOZ L, KUMAR V, et al. A comparative study of anomaly detection schemes in network intrusion detection[C]//Proceedings of the 2003 SIAM International Conference on Data Mining. Philadelphia, USA: SIAM, 2003: 25-36.
8 YE N , ZHANG Y , BORROR C M . Robustness of the Markov-chain model for cyber-attack detection[J]. IEEE Transactions on Reliability, 2004, 53 (1): 116- 123.
doi: 10.1109/TR.2004.823851
9 NOVIKOV D, YAMPOLSKIY R V, REZNIK L. Anomaly detection based intrusion detection[C]//Proceedings of Third International Conference on Information Technology: New Generations (ITNG'06). Las Vegas, USA: IEEE, 2006: 420-425.
10 WANG Wei , DAI Hong , ZHAO Siqi . Intrusion detection method based on feature optimization and BP neural[J]. Computer Engineering and Design, 2021, 42 (10): 2755- 2761.
11 TOOSI A N , KAHANI M . A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers[J]. Computer Communications, 2007, 30 (10): 2201- 2212.
doi: 10.1016/j.comcom.2007.05.002
12 LAHRE M K , DHAR M T , SURESH D , et al. Analyze different approaches for ids using kdd 99 data set[J]. International Journal on Recent and Innovation Trends in Computing and Communication, 2013, 1 (8): 645- 651.
13 ZHANG Z , SHEN H . Application of online-training SVMs for real-time intrusion detection with different considerations[J]. Computer Communications, 2005, 28 (12): 1428- 1442.
doi: 10.1016/j.comcom.2005.01.014
14 TAN S. An intrusion detection method based on stacked autoencoder and support vector machine[C]//Proceedings of Journal of Physics: Conference Series. Xi'an, China: IOP, 2020: 1-7.
15 KHRAISAT A , GONDAL I , VAMPLEW P , et al. Hybrid intrusion detection system based on the stacking ensemble of c5 decision tree classifier and one class support vector machine[J]. Electronics, 2020, 9 (1): 173- 191.
doi: 10.3390/electronics9010173
16 LIU W , CI L L , LIU L P . A new method of fuzzy support vector machine algorithm for intrusion detection[J]. Applied Sciences, 2020, 10 (3): 1065- 1085.
doi: 10.3390/app10031065
17 ILGUN K , KEMMERER R A , PORRAS P A . State transition analysis: a rule-based intrusion detection approach[J]. IEEE Transactions on Software Engineering, 1995, 21 (3): 181- 199.
doi: 10.1109/32.372146
18 LEE W, STOLFO S J, MOK K W. A data mining framework for building intrusion detection models[C]//Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, USA: IEEE, 1999: 120-132.
19 LOHIYA R , THAKKAR A . Intrusion detection using deep neural network with antirectifier layer[M]. Singapore: Springer, 2021: 89- 105.
20 LI L H, AHMAD R, TSAI W C, et al. A feature selection based DNN for intrusion detection system[C]//Proceedings of 2021 15th International Conference on Ubiquitous Information Management and Communication(IMCOM). Seoul, Korea: IEEE, 2021: 1-8.
21 FARRAHI S V , AHMADZADEH M . KCMC: a hybrid learning approach for network intrusion detection using K-means clustering and multiple classifiers[J]. International Journal of Computer Applications, 2015, 124 (9): 18- 23.
doi: 10.5120/ijca2015905365
22 PALIWAL S , GUPTA R . Denial-of-service, probing & remote to user (R2L) attack detection using genetic algorithm[J]. International Journal of Computer Applications, 2012, 60 (19): 57- 62.
23 PENG K , LEUNG V , ZHENG L , et al. Intrusion detection system based on decision tree over big data in fog environment[J]. Wireless Communications and Mobile Computing, 2018, 2018 (1): 1- 10.
24 VIMALKUMAR K, RADHIKA N. A big data framework for intrusion detection in smart grids using apache spark[C]//Proceedings of 2017 International Conference on Advances in Computing, Communications and Infor-matics. Udupi, India: IEEE, 2017: 198-204.
25 GUO K, SUI L, QIU J, et al. From model to FPGA: software-hardware co-design for efficient neural network acceleration[C]//Proceedings of 2016 IEEE Hot Chips 28 Symposium (HCS). Cupertino, USA: IEEE, 2016: 1-27.
26 RAJAGOPAL S , KUNDAPUR P P , HAREESHA K S . A stacking ensemble for network intrusion detection using heterogeneous datasets[J]. Security and Communication Networks, 2020, 2020 (1): 1- 9.
doi: 10.1016/S1353-4858(20)30001-5
27 BALAKRISHNAN S , VENKATALAKSHMI K , KANNAN A . Intrusion detection system using feature selection and classification technique[J]. International Journal of Computer Science and Application, 2014, 3 (4): 145- 151.
doi: 10.14355/ijcsa.2014.0304.02
28 ZHANG Y, REN X, ZHANG J. Intrusion detection method based on information gain and ReliefF feature selection[C]// Proceedings of 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary: IEEE, 2019: 1-5.
29 ZHANG J, ZHANG Y, LI K. A network intrusion detection model based on the combination of ReliefF and Borderline-SMOTE[C]//Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence. New York, USA: Association for Computing Machinery, 2020: 199-203.
30 KIRA K, RENDELL L A. The feature selection problem: traditional methods and a new algorithm[C]//Proceedings of the Tenth National Conference on Artificial Intelligence. San Jose, California: AAAI, 1992: 129-134.
31 KONONENKO L . Estimating attributes: analysis and extensions of Relief[J]. Lecture Notes in Computer Science, 1994, 784 (1): 171- 182.
32 马超. 基于ReliefF和改进乌鸦搜索优化的并行入侵检测方法[J]. 计算机应用研究, 2019, 36 (10): 3063- 3068.
MA Chao . Parallel network intrusion detection method based on ReliefF and improved crow search optimization[J]. Application Research of Computers, 2019, 36 (10): 3063- 3068.
33 SUN L , KONG X , XU J , et al. A hybrid gene selection method based on ReliefF and ant colony optimization algorithm for tumor classification[J]. Scientific Reports, 2019, 9 (1): 1- 14.
doi: 10.1038/s41598-018-37186-2
34 BENESTY J , CHEN J , HUANG Y , et al. Pearson correlation coefficient[M]. Berlin, Germany: Springer, 2009: 1- 4.
35 REVATHI S , MALATHI A . A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection[J]. International Journal of Engineering Research & Technology, 2013, 2 (12): 1848- 1853.
36 ROY A, SINGH K J. Multi-classification of UNSW-NB15 dataset for network anomaly detection system[C]//Proceedings of International Conference on Communication and Computational Technologies. Singapore: Springer, 2021: 429-451.
37 张师鹏, 李永忠, 杜祥通. 基于半监督学习和三支决策的入侵检测模型[J]. 计算机应用, 2021, 41 (9): 2602- 2608.
ZHANG Shipeng , LI Yongzhong , DU Xiangtong . Intrusion detection model based on semi-supervised learning and three-way decision[J]. Journal of Computer Applications, 2021, 41 (9): 2602- 2608.
38 吴启睿, 黄树成. 结合卷积神经网络和三支决策的入侵检测算法[J]. 计算机工程与应用, 2022, 58 (13): 119- 127.
WU Qirui , HUANG Shucheng . Intrusion detection algorithm combining convolutional neural network and three-branch decision[J]. Computer Engineering and Applications, 2022, 58 (13): 119- 127.
39 王振东, 张林, 杨书新, 等. 面向入侵检测的Taylor神经网络构建与分析[J/OL]. 计算机科学与探索. (2021-09-09)[2021-11-14]. http://kns.cnki.net/kcms/detail/11.5602.TP.20210909.0906.002.html.
40 朱世松, 巴梦龙, 王辉, 等. 基于NBSR模型的入侵检测技术[J]. 计算机工程与科学, 2020, 42 (3): 427- 433.
ZHU Shisong , BA Menglong , WANG Hui , et al. An intrusion detection technology based on NBSR model[J]. Computer Engineering & Science, 2020, 42 (3): 427- 433.
[1] 许传臻,袭肖明,李维翠,孙仪,杨璐. 基于自适应多分辨率特征学习的CNV分型网络[J]. 山东大学学报 (工学版), 2022, 52(4): 69-75.
[2] 孟令灿,聂秀山,张雪. 基于遮挡目标去除的公交车拥挤度分类算法[J]. 山东大学学报 (工学版), 2022, 52(4): 83-88.
[3] 孙志巍,宋明阳,潘泽华,景丽萍. 上下文感知的判别式主题模型[J]. 山东大学学报 (工学版), 2022, 52(4): 131-138.
[4] 袁高腾,周晓峰,郭宏乐. 基于特征选择算法的ECG信号分类[J]. 山东大学学报 (工学版), 2022, 52(4): 38-44.
[5] 王丽,于明仟,刘文鹏,周瑜,郑蕊蕊,贺建军. 面向类不平衡数据的K近邻偏标记学习算法[J]. 山东大学学报 (工学版), 2022, 52(3): 18-24.
[6] 龚楷伦,翟婷婷,唐鸿成. 一种面向多标签分类的在线主动学习算法[J]. 山东大学学报 (工学版), 2022, 52(2): 80-88.
[7] 张沁洋,李旭,姚春龙,李长吾. 结合句法依存信息的方面级情感分类[J]. 山东大学学报 (工学版), 2021, 51(2): 83-89.
[8] 肖卓宇,何锫,陈果,徐运标,郭杰. 带特征指标约束描述的设计模式分类挖掘[J]. 山东大学学报 (工学版), 2020, 50(6): 48-58.
[9] 马常霞,张晨. 中文对话理解中基于预训练的意图分类和槽填充联合模型[J]. 山东大学学报 (工学版), 2020, 50(6): 68-75.
[10] 彭岩,冯婷婷,王洁. 基于集成学习的O3的质量浓度预测模型[J]. 山东大学学报 (工学版), 2020, 50(4): 1-7.
[11] 赵宁宁,唐雪嵩,赵鸣博. 基于卷积神经网络的深度线段分类算法[J]. 山东大学学报 (工学版), 2020, 50(4): 22-27.
[12] 冯超,徐鲲鹏,陈黎飞. 符号序列的LDA主题特征表示方法[J]. 山东大学学报 (工学版), 2020, 50(2): 60-65.
[13] 李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117.
[14] 宋士奇,朴燕,蒋泽新. 基于改进YOLOv3的复杂场景车辆分类与跟踪[J]. 山东大学学报 (工学版), 2020, 50(2): 27-33.
[15] 张海军,陈映辉. 语义分析及向量化大数据跨站脚本攻击智检[J]. 山东大学学报 (工学版), 2020, 50(2): 118-128.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张 欣,李术才,李树忱 . 考虑天然渗流场影响的地应力场反演回归分析及应用[J]. 山东大学学报(工学版), 2008, 38(4): 57 -62 .
[2] 施来顺,董岩岩,李彦彦,李文静 . 二氧化氯催化氧化处理铬黑T模拟废水的实验[J]. 山东大学学报(工学版), 2007, 37(5): 113 -117 .
[3] 牛林 赵建国 李可军. 1000kV特高压交流输电线路工频磁场分析[J]. 山东大学学报(工学版), 2010, 40(1): 154 -158 .
[4] 李文义,许士国,王兴菊, . 河流水量组成分析与计算方法研究[J]. 山东大学学报(工学版), 2006, 36(2): 71 -74 .
[5] 冯现大 李树忱 徐帮树. 海底隧道涌水量影响因素的数值模拟研究[J]. 山东大学学报(工学版), 2009, 39(4): 21 -24 .
[6] 王虹入1,王中秋1, 3*, 张倩2,李剑峰3, 孙杰3. 切削法构建铝合金Al7050-T7451材料流动应力本构模型[J]. 山东大学学报(工学版), 2012, 42(1): 115 -120 .
[7] 张道强. 知识保持的嵌入方法[J]. 山东大学学报(工学版), 2010, 40(2): 1 -10 .
[8] 冯治宇 . 褐煤基吸附催化剂脱硫脱氮的研究[J]. 山东大学学报(工学版), 2007, 37(1): 107 -110 .
[9] 张训华1,业宁2,王厚立3. 基于Harris角点的木材CT图像配准[J]. 山东大学学报(工学版), 2010, 40(5): 101 -104 .
[10] 于江德1,赵红丹1,郑勃举1,余正涛2. 基于中文人名用字特征的性别判定方法[J]. 山东大学学报(工学版), 2014, 44(1): 13 -18 .