您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (2): 1-10.doi: 10.6040/j.issn.1672-3961.0.2022.136

• 机器学习与数据挖掘 •    下一篇

一种基于改进ReliefF算法的入侵检测模型

刘财辉(),周琪*(),叶晓文   

  1. 赣南师范大学数学与计算机科学学院,江西 赣州 341000
  • 收稿日期:2022-04-11 出版日期:2023-04-22 发布日期:2023-04-21
  • 通讯作者: 周琪 E-mail:liu_caihui@163.com;1203302314@qq.com
  • 作者简介:刘财辉(1979—),男,江西于都人,教授,博士,主要研究方向为粒计算、机器学习与数据挖掘。E-mail: liu_caihui@163.com

An intrusion detection model based on improved ReliefF algorithm

Caihui LIU(),Qi ZHOU*(),Xiaowen YE   

  1. School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, Jiangxi, China
  • Received:2022-04-11 Online:2023-04-22 Published:2023-04-21
  • Contact: Qi ZHOU E-mail:liu_caihui@163.com;1203302314@qq.com

摘要:

针对现有入侵检测算法中特征提取不充分、未考虑特征权重的影响、模型分类不够精确等问题,提出一种基于改进ReliefF算法的入侵检测模型。通过优化入侵数据特征权重计算,提出改进的ReliefF算法;根据计算特征的Pearson相关系数,建立特征相关性量表。只保留其中一个相关性高的特征,以实现特征的二次优化;对最优特征子集分别使用决策树(decision tree,DT)、k-最近邻(k-nearest neighbor,KNN)、随机森林(random forest,RF)、朴素贝叶斯(naive bayes,NB)和支持向量机(support vector machine,SVM)5种分类器评价该方法的分类性能和准确性。在NSL-KDD和UNSW-NB15两个数据集上的试验结果表明,该方法不仅具有较好的检测性能,还能有效降低特征维度,对分类器的计算复杂度有积极的影响。

关键词: ReliefF算法, 权重优化, 特征选择, 入侵检测, 分类

Abstract:

Aiming at the problems of insufficient feature extraction in the existing intrusion detection algorithms, the influence of feature weights was not considered, and the model classification was not accurate enough, an intrusion detection model based on the improved ReliefF algorithm was proposed. By optimizing the calculation of the feature weight of the intrusion data, an improved algorithm of ReliefF was proposed, based on the Pearson correlation coefficient of the calculated feature, a feature correlation scale was established. Only one of the features with high correlation was retained to realize the secondary optimization of the features, and finally decision tree, k-nearest neighbor, random forest, naive bayes and support vector machine classifier were used to evaluate the classification performance and accuracy. Experimental results on NSL-KDD and UNSW-NB15 data sets showed that this method could not only effectively reduce the feature dimension, but also had better detection performance, which had a positive effect on the computational complexity of the classifier.

Key words: ReliefF algorithm, weight optimization, feature selection, intrusion detection, classification

中图分类号: 

  • TP3-0

图1

基于机器学习的入侵检测模型"

图2

基于改进ReliefF算法的入侵检测模型"

表1

NSL-KDD数据集的数据分布"

类别 样本数量 百分比/%
Nomal 77 054 51.88
Dos 53 385 35.95
Probe 14 077 9.48
U2R 252 0.17
R2L 3 749 2.52
合计 148 517 100.00

表2

NSL-KDD数据集样本选取"

攻击类型 Tr训练集样本个数 Ts测试集样本个数
Nomal 67 343 9 711
Dos 45 927 7 458
Probe 11 656 2 421
U2R 52 200
R2L 995 2 754
合计 125 973 22 544

图3

数据预处理流程图"

表3

数据集中10%和20%数据量的特征权重排序结果"

特征名称 特征权重
10%数据集 20%数据集
dst_host_serror_rate 0.496 5 0.501 0
logged_in 0.457 2 0.438 8
serror_rate 0.428 9 0.424 3
srv_serror_rate 0.373 1 0.382 7
same_srv_rate 0.273 1 0.271 0
dst_host_srv_serror_rate 0.210 4 0.213 6
dst_host_same_srv_rate 0.182 4 0.177 3
dst_host_srv_count 0.145 4 0.139 4
protocol_type 0.131 1 0.135 3
dst_host_count 0.124 0 0.131 2
flag 0.089 6 0.089 4
service 0.085 7 0.086 8
dst_host_same_src_port_rate 0.072 6 0.087 4
count 0.045 8 0.041 7
dst_host_rerror_rate 0.034 8 0.032 5
srv_diff_host_rate 0.025 8 0.032 5
dst_host_diff_srv_rate 0.022 0 0.024 3
dst_host_srv_rerror_rate 0.020 6 0.017 0
rerror_rate 0.016 5 0.018 2
srv_count 0.012 6 0.011 5
is_guest_login 0.010 6 0.011 7
diff_srv_rate 0.010 0 0.010 1
srv_rerror_rate 0.010 0 0.012 0
wrong_fragment 0.007 1 0.006 7
dst_host_srv_diff_host_rate 0.002 5 0.004 2
duration 0.001 9 0.003 0
root_shell 0.001 6 0.000 8
su_attempted 0.000 8 0.000 0
hot 0.000 7 0.000 8
num_shells 0.000 4 0.000 0
num_failed_logins 0.000 0 0.000 0
num_access_files 0.000 0 0.000 0
src_bytes 0.000 0 0.000 0
dst_bytes 0.000 0 0.000 0
land 0.000 0 0.000 0
urgent 0.000 0 0.000 0
num_compromised 0.000 0 0.000 0
num_root 0.000 0 0.000 0
num_file_creations 0.000 0 0.000 0
is_hot_login 0.000 0 0.000 0

表4

特征子集个数"

处理前的特征子集个数 M-ReliefF处理后的特征子集个数 二次优化后的特征子集个数
41 20 20

图4

相关性热力图"

表5

分类样本参数"

样本分类 判断为攻击 判断为正常
攻击样本 TP FN
正常样本 FP TN

表6

不同分类算法的性能指标对比"

数据集名称 分类算法评估准则
ACC/% RP/% F1 /% RD/% RFP/% 算法耗时/s
NSL-KDDDT 98.15 98.45 98.19 97.94 1.62 0.246 6
MR-DT 98.01 98.32 98.05 97.79 1.75 0.172 4
KNN 98.79 98.92 98.81 98.71 1.12 40.345 2
MR-KNN 97.89 97.89 97.94 97.70 1.90 3.193 4
RF 99.44 99.40 99.45 99.50 0.62 5.879 6
MR-RF 96.06 94.28 96.24 98.28 1.49 3.489 3
NB 87.02 84.19 87.89 91.93 18.14 0.210 5
MR-NB 85.13 82.39 86.16 90.29 20.28 0.052 1
SVM 96.70 94.32 96.86 99.55 6.29 2 591.549 7
MR-SVM 95.64 93.58 95.84 98.22 6.63 70.291 3
UNSW-NB15DT 95.03 96.32 96.35 96.38 7.80 2.519 6
MR-DT 94.96 95.83 96.33 96.84 9.09 1.033 7
KNN 93.82 94.47 95.50 96.56 11.99 126.400 7
MR-KNN 94.22 95.45 95.77 96.09 9.75 59.895 2
RF 95.91 96.19 97.02 97.86 0.62 24.542 2
MR-RF 93.71 91.54 95.58 99.99 19.68 10.384 6
NB 87.40 88.74 90.96 93.29 25.12 0.221 6
MR-NB 85.68 84.06 90.30 97.55 39.95 0.118 2
SVM 93.69 91.83 95.54 99.58 18.81 683.028 1
MR-SVM 93.94 92.12 95.74 99.65 18.39 434.906 3

图5

NSL-KDD数据集中不同分类算法的ROC曲线对比图"

图6

UNSW-NB15数据集中不同分类算法的ROC曲线对比图"

表7

本研究方法与其他模型的效果比较"

数据集名称 方法模型 ACC RD RFP
NSL-KDD本研究方法 98.01 97.79 1.75
SSL-3WD[37] 96.10 97.70 2.40
CNN-TWD[38] 96.10 92.30 2.00
Simple-TNN[39] 97.60 98.90 NAN
UNSW-NB15本研究方法 94.96 96.84 9.09
SSL-3WD[37] 94.70 96.30 3.20
ReliefF-P[40] NAN 87.39 15.62
1 SULTANA N , CHILAMKURTI N , PENG W , et al. Survey on SDN based network intrusion detection system using machine learning approaches[J]. Peer-to-Peer Networking and Applications, 2019, 12 (2): 493- 501.
doi: 10.1007/s12083-017-0630-0
2 SVENMARCK P, LUOTSINEN L, NILSSON M, et al. Possibilities and challenges for artificial intelligence in military applications[C]//Proceedings of the NATO Big Data and Artificial Intelligence for Military Decision Making Specialists' Meeting. Bordeaux, France: Computer Science, 2018: 1-16.
3 STAMPAR M, FERTALJ K. Artificial intelligence in network intrusion detection[C]//Proceedings of 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). Opatija, Croatia: IEEE, 2015: 1318-1323.
4 LEE W, STOLFO S J, CHAN P K, et al. Real time data mining-based intrusion detection[C]//Proceedings of DARPA Information Survivability Conference and Exposition Ⅱ(DISCEX'01). Anaheim, USA: IEEE, 2001: 89-100.
5 KUMAR G , KUMAR K , SACHDEVA M . The use of artificial intelligence based techniques for intrusion detection: a review[J]. Artificial Intelligence Review, 2010, 34 (4): 369- 387.
doi: 10.1007/s10462-010-9179-5
6 MEHDI S A, KHALID J, KHAYAM S A. Revisiting traffic anomaly detection using software defined networking[C]//Proceedings of International Workshop on Recent Advances in Intrusion Detection. Heidelberg, Germany: Springer, 2011: 161-180.
7 LAZAREVIC A, ERTOZ L, KUMAR V, et al. A comparative study of anomaly detection schemes in network intrusion detection[C]//Proceedings of the 2003 SIAM International Conference on Data Mining. Philadelphia, USA: SIAM, 2003: 25-36.
8 YE N , ZHANG Y , BORROR C M . Robustness of the Markov-chain model for cyber-attack detection[J]. IEEE Transactions on Reliability, 2004, 53 (1): 116- 123.
doi: 10.1109/TR.2004.823851
9 NOVIKOV D, YAMPOLSKIY R V, REZNIK L. Anomaly detection based intrusion detection[C]//Proceedings of Third International Conference on Information Technology: New Generations (ITNG'06). Las Vegas, USA: IEEE, 2006: 420-425.
10 WANG Wei , DAI Hong , ZHAO Siqi . Intrusion detection method based on feature optimization and BP neural[J]. Computer Engineering and Design, 2021, 42 (10): 2755- 2761.
11 TOOSI A N , KAHANI M . A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers[J]. Computer Communications, 2007, 30 (10): 2201- 2212.
doi: 10.1016/j.comcom.2007.05.002
12 LAHRE M K , DHAR M T , SURESH D , et al. Analyze different approaches for ids using kdd 99 data set[J]. International Journal on Recent and Innovation Trends in Computing and Communication, 2013, 1 (8): 645- 651.
13 ZHANG Z , SHEN H . Application of online-training SVMs for real-time intrusion detection with different considerations[J]. Computer Communications, 2005, 28 (12): 1428- 1442.
doi: 10.1016/j.comcom.2005.01.014
14 TAN S. An intrusion detection method based on stacked autoencoder and support vector machine[C]//Proceedings of Journal of Physics: Conference Series. Xi'an, China: IOP, 2020: 1-7.
15 KHRAISAT A , GONDAL I , VAMPLEW P , et al. Hybrid intrusion detection system based on the stacking ensemble of c5 decision tree classifier and one class support vector machine[J]. Electronics, 2020, 9 (1): 173- 191.
doi: 10.3390/electronics9010173
16 LIU W , CI L L , LIU L P . A new method of fuzzy support vector machine algorithm for intrusion detection[J]. Applied Sciences, 2020, 10 (3): 1065- 1085.
doi: 10.3390/app10031065
17 ILGUN K , KEMMERER R A , PORRAS P A . State transition analysis: a rule-based intrusion detection approach[J]. IEEE Transactions on Software Engineering, 1995, 21 (3): 181- 199.
doi: 10.1109/32.372146
18 LEE W, STOLFO S J, MOK K W. A data mining framework for building intrusion detection models[C]//Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, USA: IEEE, 1999: 120-132.
19 LOHIYA R , THAKKAR A . Intrusion detection using deep neural network with antirectifier layer[M]. Singapore: Springer, 2021: 89- 105.
20 LI L H, AHMAD R, TSAI W C, et al. A feature selection based DNN for intrusion detection system[C]//Proceedings of 2021 15th International Conference on Ubiquitous Information Management and Communication(IMCOM). Seoul, Korea: IEEE, 2021: 1-8.
21 FARRAHI S V , AHMADZADEH M . KCMC: a hybrid learning approach for network intrusion detection using K-means clustering and multiple classifiers[J]. International Journal of Computer Applications, 2015, 124 (9): 18- 23.
doi: 10.5120/ijca2015905365
22 PALIWAL S , GUPTA R . Denial-of-service, probing & remote to user (R2L) attack detection using genetic algorithm[J]. International Journal of Computer Applications, 2012, 60 (19): 57- 62.
23 PENG K , LEUNG V , ZHENG L , et al. Intrusion detection system based on decision tree over big data in fog environment[J]. Wireless Communications and Mobile Computing, 2018, 2018 (1): 1- 10.
24 VIMALKUMAR K, RADHIKA N. A big data framework for intrusion detection in smart grids using apache spark[C]//Proceedings of 2017 International Conference on Advances in Computing, Communications and Infor-matics. Udupi, India: IEEE, 2017: 198-204.
25 GUO K, SUI L, QIU J, et al. From model to FPGA: software-hardware co-design for efficient neural network acceleration[C]//Proceedings of 2016 IEEE Hot Chips 28 Symposium (HCS). Cupertino, USA: IEEE, 2016: 1-27.
26 RAJAGOPAL S , KUNDAPUR P P , HAREESHA K S . A stacking ensemble for network intrusion detection using heterogeneous datasets[J]. Security and Communication Networks, 2020, 2020 (1): 1- 9.
doi: 10.1016/S1353-4858(20)30001-5
27 BALAKRISHNAN S , VENKATALAKSHMI K , KANNAN A . Intrusion detection system using feature selection and classification technique[J]. International Journal of Computer Science and Application, 2014, 3 (4): 145- 151.
doi: 10.14355/ijcsa.2014.0304.02
28 ZHANG Y, REN X, ZHANG J. Intrusion detection method based on information gain and ReliefF feature selection[C]// Proceedings of 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary: IEEE, 2019: 1-5.
29 ZHANG J, ZHANG Y, LI K. A network intrusion detection model based on the combination of ReliefF and Borderline-SMOTE[C]//Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence. New York, USA: Association for Computing Machinery, 2020: 199-203.
30 KIRA K, RENDELL L A. The feature selection problem: traditional methods and a new algorithm[C]//Proceedings of the Tenth National Conference on Artificial Intelligence. San Jose, California: AAAI, 1992: 129-134.
31 KONONENKO L . Estimating attributes: analysis and extensions of Relief[J]. Lecture Notes in Computer Science, 1994, 784 (1): 171- 182.
32 马超. 基于ReliefF和改进乌鸦搜索优化的并行入侵检测方法[J]. 计算机应用研究, 2019, 36 (10): 3063- 3068.
MA Chao . Parallel network intrusion detection method based on ReliefF and improved crow search optimization[J]. Application Research of Computers, 2019, 36 (10): 3063- 3068.
33 SUN L , KONG X , XU J , et al. A hybrid gene selection method based on ReliefF and ant colony optimization algorithm for tumor classification[J]. Scientific Reports, 2019, 9 (1): 1- 14.
doi: 10.1038/s41598-018-37186-2
34 BENESTY J , CHEN J , HUANG Y , et al. Pearson correlation coefficient[M]. Berlin, Germany: Springer, 2009: 1- 4.
35 REVATHI S , MALATHI A . A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection[J]. International Journal of Engineering Research & Technology, 2013, 2 (12): 1848- 1853.
36 ROY A, SINGH K J. Multi-classification of UNSW-NB15 dataset for network anomaly detection system[C]//Proceedings of International Conference on Communication and Computational Technologies. Singapore: Springer, 2021: 429-451.
37 张师鹏, 李永忠, 杜祥通. 基于半监督学习和三支决策的入侵检测模型[J]. 计算机应用, 2021, 41 (9): 2602- 2608.
ZHANG Shipeng , LI Yongzhong , DU Xiangtong . Intrusion detection model based on semi-supervised learning and three-way decision[J]. Journal of Computer Applications, 2021, 41 (9): 2602- 2608.
38 吴启睿, 黄树成. 结合卷积神经网络和三支决策的入侵检测算法[J]. 计算机工程与应用, 2022, 58 (13): 119- 127.
WU Qirui , HUANG Shucheng . Intrusion detection algorithm combining convolutional neural network and three-branch decision[J]. Computer Engineering and Applications, 2022, 58 (13): 119- 127.
39 王振东, 张林, 杨书新, 等. 面向入侵检测的Taylor神经网络构建与分析[J/OL]. 计算机科学与探索. (2021-09-09)[2021-11-14]. http://kns.cnki.net/kcms/detail/11.5602.TP.20210909.0906.002.html.
40 朱世松, 巴梦龙, 王辉, 等. 基于NBSR模型的入侵检测技术[J]. 计算机工程与科学, 2020, 42 (3): 427- 433.
ZHU Shisong , BA Menglong , WANG Hui , et al. An intrusion detection technology based on NBSR model[J]. Computer Engineering & Science, 2020, 42 (3): 427- 433.
[1] 唐杰烽,张佳,龙锦益. 基于全局冗余最小的快速多标签特征选择方法[J]. 山东大学学报 (工学版), 2025, 55(6): 21-34.
[2] 吴正健,吾尔尼沙·买买提,杨耀威,阿力木江·艾沙,库尔班·吾布力. 基于DRCoALTP的印刷体文档图像多文种识别方法[J]. 山东大学学报 (工学版), 2025, 55(1): 51-57.
[3] 白琳,俱通,王浩,雷明珠,潘晓英. 面向不平衡数据的提升均衡集成学习算法[J]. 山东大学学报 (工学版), 2024, 54(4): 59-66.
[4] 陈晓江,杨晓奇,陈广豪,刘伍颖. 混合BERT和宽度学习的低时间复杂度短文本分类[J]. 山东大学学报 (工学版), 2024, 54(4): 51-58.
[5] 宋辉,张轶哲,张功萱,孟元. 基于类权重和最小化预测熵的测试时集成方法[J]. 山东大学学报 (工学版), 2024, 54(3): 36-43.
[6] 聂秀山,巩蕊,董飞,郭杰,马玉玲. 短视频场景分类方法综述[J]. 山东大学学报 (工学版), 2024, 54(3): 1-11.
[7] 徐金华,罗义凯,李昱燃,李岩. 基于时频分解与深度学习的轨道客流预测[J]. 山东大学学报 (工学版), 2024, 54(2): 60-68.
[8] 马坤,刘筱云,李乐平,纪科,陈贞翔,杨波. 用于意图识别的自适应多标签信息学习模型[J]. 山东大学学报 (工学版), 2024, 54(1): 45-51.
[9] 于泓,杜娟,魏琳,张利. 计及行为特征的市场化用户电量数据拟合方法[J]. 山东大学学报 (工学版), 2023, 53(4): 113-119.
[10] 李颖,王建坤. 基于监督图正则化和信息融合的轻度认知障碍分类方法[J]. 山东大学学报 (工学版), 2023, 53(4): 65-73.
[11] 张喜龙,韩萌,陈志强,武红鑫,李慕航. 动态集成选择的不平衡漂移数据流Boosting分类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 83-92.
[12] 许传臻,袭肖明,李维翠,孙仪,杨璐. 基于自适应多分辨率特征学习的CNV分型网络[J]. 山东大学学报 (工学版), 2022, 52(4): 69-75.
[13] 袁高腾,周晓峰,郭宏乐. 基于特征选择算法的ECG信号分类[J]. 山东大学学报 (工学版), 2022, 52(4): 38-44.
[14] 孟令灿,聂秀山,张雪. 基于遮挡目标去除的公交车拥挤度分类算法[J]. 山东大学学报 (工学版), 2022, 52(4): 83-88.
[15] 孙志巍,宋明阳,潘泽华,景丽萍. 上下文感知的判别式主题模型[J]. 山东大学学报 (工学版), 2022, 52(4): 131-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[2] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[3] 孙从征,管从胜,秦敬玉,程川 . 铝合金化学镀镍磷合金结构和性能[J]. 山东大学学报(工学版), 2007, 37(5): 108 -112 .
[4] 胡天亮,李鹏,张承瑞,左毅 . 基于VHDL的正交编码脉冲电路解码计数器设计[J]. 山东大学学报(工学版), 2008, 38(3): 10 -13 .
[5] 田芳1,张颖欣2,张礼3,侯秀萍3,裘南畹3. 新型金属氧化物薄膜气敏元件基材料的开发[J]. 山东大学学报(工学版), 2009, 39(2): 104 -107 .
[6] 李善评,赵玉晓,乔鹏,冯正志 . 好氧颗粒污泥的培养及基质降解和污泥生长动力学分析[J]. 山东大学学报(工学版), 2008, 38(3): 95 -98 .
[7] 李士进,王声特,黄乐平. 基于正反向异质性的遥感图像变化检测[J]. 山东大学学报(工学版), 2018, 48(3): 1 -9 .
[8] 王伟,毛华永,李国祥,潘世艳,巩厅房,晋世强,郝胜兵 . 一种车用燃油加热器燃烧器的流场数值分析[J]. 山东大学学报(工学版), 2008, 38(3): 64 -68 .
[9] 孙向勇 . 不含四圈,三圈不重点的平面图全染色的一个结论[J]. 山东大学学报(工学版), 2007, 37(3): 118 -121 .
[10] 许延生,刘兴芳 . 模糊聚类迭代模型在水资源承载能力评价中的应用[J]. 山东大学学报(工学版), 2007, 37(3): 100 -104 .