Journal of Shandong University(Engineering Science) ›› 2023, Vol. 53 ›› Issue (2): 1-10.doi: 10.6040/j.issn.1672-3961.0.2022.136

• Machine Learning & Data Mining •     Next Articles

An intrusion detection model based on improved ReliefF algorithm

Caihui LIU(),Qi ZHOU*(),Xiaowen YE   

  1. School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, Jiangxi, China
  • Received:2022-04-11 Online:2023-04-22 Published:2023-04-21
  • Contact: Qi ZHOU E-mail:liu_caihui@163.com;1203302314@qq.com

Abstract:

Aiming at the problems of insufficient feature extraction in the existing intrusion detection algorithms, the influence of feature weights was not considered, and the model classification was not accurate enough, an intrusion detection model based on the improved ReliefF algorithm was proposed. By optimizing the calculation of the feature weight of the intrusion data, an improved algorithm of ReliefF was proposed, based on the Pearson correlation coefficient of the calculated feature, a feature correlation scale was established. Only one of the features with high correlation was retained to realize the secondary optimization of the features, and finally decision tree, k-nearest neighbor, random forest, naive bayes and support vector machine classifier were used to evaluate the classification performance and accuracy. Experimental results on NSL-KDD and UNSW-NB15 data sets showed that this method could not only effectively reduce the feature dimension, but also had better detection performance, which had a positive effect on the computational complexity of the classifier.

Key words: ReliefF algorithm, weight optimization, feature selection, intrusion detection, classification

CLC Number: 

  • TP3-0

Fig.1

Intrusion detection model based on machine learning"

Fig.2

Intrusion detection model based on improved ReliefF algorithm"

Table 1

Data distribution of NSL-KDD dataset"

类别 样本数量 百分比/%
Nomal 77 054 51.88
Dos 53 385 35.95
Probe 14 077 9.48
U2R 252 0.17
R2L 3 749 2.52
合计 148 517 100.00

Table 2

NSL-KDD data set sample selection"

攻击类型 Tr训练集样本个数 Ts测试集样本个数
Nomal 67 343 9 711
Dos 45 927 7 458
Probe 11 656 2 421
U2R 52 200
R2L 995 2 754
合计 125 973 22 544

Fig.3

Data preprocessing flowchart"

Table 3

Feature weight sorting results for 10% and 20% of the data volume in the data set"

特征名称 特征权重
10%数据集 20%数据集
dst_host_serror_rate 0.496 5 0.501 0
logged_in 0.457 2 0.438 8
serror_rate 0.428 9 0.424 3
srv_serror_rate 0.373 1 0.382 7
same_srv_rate 0.273 1 0.271 0
dst_host_srv_serror_rate 0.210 4 0.213 6
dst_host_same_srv_rate 0.182 4 0.177 3
dst_host_srv_count 0.145 4 0.139 4
protocol_type 0.131 1 0.135 3
dst_host_count 0.124 0 0.131 2
flag 0.089 6 0.089 4
service 0.085 7 0.086 8
dst_host_same_src_port_rate 0.072 6 0.087 4
count 0.045 8 0.041 7
dst_host_rerror_rate 0.034 8 0.032 5
srv_diff_host_rate 0.025 8 0.032 5
dst_host_diff_srv_rate 0.022 0 0.024 3
dst_host_srv_rerror_rate 0.020 6 0.017 0
rerror_rate 0.016 5 0.018 2
srv_count 0.012 6 0.011 5
is_guest_login 0.010 6 0.011 7
diff_srv_rate 0.010 0 0.010 1
srv_rerror_rate 0.010 0 0.012 0
wrong_fragment 0.007 1 0.006 7
dst_host_srv_diff_host_rate 0.002 5 0.004 2
duration 0.001 9 0.003 0
root_shell 0.001 6 0.000 8
su_attempted 0.000 8 0.000 0
hot 0.000 7 0.000 8
num_shells 0.000 4 0.000 0
num_failed_logins 0.000 0 0.000 0
num_access_files 0.000 0 0.000 0
src_bytes 0.000 0 0.000 0
dst_bytes 0.000 0 0.000 0
land 0.000 0 0.000 0
urgent 0.000 0 0.000 0
num_compromised 0.000 0 0.000 0
num_root 0.000 0 0.000 0
num_file_creations 0.000 0 0.000 0
is_hot_login 0.000 0 0.000 0

Table 4

Number of characteristic subsets"

处理前的特征子集个数 M-ReliefF处理后的特征子集个数 二次优化后的特征子集个数
41 20 20

Fig.4

Correlation heat map"

Table 5

Classification sample parameters"

样本分类 判断为攻击 判断为正常
攻击样本 TP FN
正常样本 FP TN

Table 6

Comparison of performance indicators of different classification algorithms"

数据集名称 分类算法评估准则
ACC/% RP/% F1 /% RD/% RFP/% 算法耗时/s
NSL-KDDDT 98.15 98.45 98.19 97.94 1.62 0.246 6
MR-DT 98.01 98.32 98.05 97.79 1.75 0.172 4
KNN 98.79 98.92 98.81 98.71 1.12 40.345 2
MR-KNN 97.89 97.89 97.94 97.70 1.90 3.193 4
RF 99.44 99.40 99.45 99.50 0.62 5.879 6
MR-RF 96.06 94.28 96.24 98.28 1.49 3.489 3
NB 87.02 84.19 87.89 91.93 18.14 0.210 5
MR-NB 85.13 82.39 86.16 90.29 20.28 0.052 1
SVM 96.70 94.32 96.86 99.55 6.29 2 591.549 7
MR-SVM 95.64 93.58 95.84 98.22 6.63 70.291 3
UNSW-NB15DT 95.03 96.32 96.35 96.38 7.80 2.519 6
MR-DT 94.96 95.83 96.33 96.84 9.09 1.033 7
KNN 93.82 94.47 95.50 96.56 11.99 126.400 7
MR-KNN 94.22 95.45 95.77 96.09 9.75 59.895 2
RF 95.91 96.19 97.02 97.86 0.62 24.542 2
MR-RF 93.71 91.54 95.58 99.99 19.68 10.384 6
NB 87.40 88.74 90.96 93.29 25.12 0.221 6
MR-NB 85.68 84.06 90.30 97.55 39.95 0.118 2
SVM 93.69 91.83 95.54 99.58 18.81 683.028 1
MR-SVM 93.94 92.12 95.74 99.65 18.39 434.906 3

Fig.5

Comparison of ROC curves of different classification algorithms in the NSL-KDD dataset"

Fig.6

Comparison of ROC curves of different classification algorithms in the UNSW-NB15 dataset"

Table 7

Comparison of results of this research method and other models 单位: %"

数据集名称 方法模型 ACC RD RFP
NSL-KDD本研究方法 98.01 97.79 1.75
SSL-3WD[37] 96.10 97.70 2.40
CNN-TWD[38] 96.10 92.30 2.00
Simple-TNN[39] 97.60 98.90 NAN
UNSW-NB15本研究方法 94.96 96.84 9.09
SSL-3WD[37] 94.70 96.30 3.20
ReliefF-P[40] NAN 87.39 15.62
1 SULTANA N , CHILAMKURTI N , PENG W , et al. Survey on SDN based network intrusion detection system using machine learning approaches[J]. Peer-to-Peer Networking and Applications, 2019, 12 (2): 493- 501.
doi: 10.1007/s12083-017-0630-0
2 SVENMARCK P, LUOTSINEN L, NILSSON M, et al. Possibilities and challenges for artificial intelligence in military applications[C]//Proceedings of the NATO Big Data and Artificial Intelligence for Military Decision Making Specialists' Meeting. Bordeaux, France: Computer Science, 2018: 1-16.
3 STAMPAR M, FERTALJ K. Artificial intelligence in network intrusion detection[C]//Proceedings of 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). Opatija, Croatia: IEEE, 2015: 1318-1323.
4 LEE W, STOLFO S J, CHAN P K, et al. Real time data mining-based intrusion detection[C]//Proceedings of DARPA Information Survivability Conference and Exposition Ⅱ(DISCEX'01). Anaheim, USA: IEEE, 2001: 89-100.
5 KUMAR G , KUMAR K , SACHDEVA M . The use of artificial intelligence based techniques for intrusion detection: a review[J]. Artificial Intelligence Review, 2010, 34 (4): 369- 387.
doi: 10.1007/s10462-010-9179-5
6 MEHDI S A, KHALID J, KHAYAM S A. Revisiting traffic anomaly detection using software defined networking[C]//Proceedings of International Workshop on Recent Advances in Intrusion Detection. Heidelberg, Germany: Springer, 2011: 161-180.
7 LAZAREVIC A, ERTOZ L, KUMAR V, et al. A comparative study of anomaly detection schemes in network intrusion detection[C]//Proceedings of the 2003 SIAM International Conference on Data Mining. Philadelphia, USA: SIAM, 2003: 25-36.
8 YE N , ZHANG Y , BORROR C M . Robustness of the Markov-chain model for cyber-attack detection[J]. IEEE Transactions on Reliability, 2004, 53 (1): 116- 123.
doi: 10.1109/TR.2004.823851
9 NOVIKOV D, YAMPOLSKIY R V, REZNIK L. Anomaly detection based intrusion detection[C]//Proceedings of Third International Conference on Information Technology: New Generations (ITNG'06). Las Vegas, USA: IEEE, 2006: 420-425.
10 WANG Wei , DAI Hong , ZHAO Siqi . Intrusion detection method based on feature optimization and BP neural[J]. Computer Engineering and Design, 2021, 42 (10): 2755- 2761.
11 TOOSI A N , KAHANI M . A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers[J]. Computer Communications, 2007, 30 (10): 2201- 2212.
doi: 10.1016/j.comcom.2007.05.002
12 LAHRE M K , DHAR M T , SURESH D , et al. Analyze different approaches for ids using kdd 99 data set[J]. International Journal on Recent and Innovation Trends in Computing and Communication, 2013, 1 (8): 645- 651.
13 ZHANG Z , SHEN H . Application of online-training SVMs for real-time intrusion detection with different considerations[J]. Computer Communications, 2005, 28 (12): 1428- 1442.
doi: 10.1016/j.comcom.2005.01.014
14 TAN S. An intrusion detection method based on stacked autoencoder and support vector machine[C]//Proceedings of Journal of Physics: Conference Series. Xi'an, China: IOP, 2020: 1-7.
15 KHRAISAT A , GONDAL I , VAMPLEW P , et al. Hybrid intrusion detection system based on the stacking ensemble of c5 decision tree classifier and one class support vector machine[J]. Electronics, 2020, 9 (1): 173- 191.
doi: 10.3390/electronics9010173
16 LIU W , CI L L , LIU L P . A new method of fuzzy support vector machine algorithm for intrusion detection[J]. Applied Sciences, 2020, 10 (3): 1065- 1085.
doi: 10.3390/app10031065
17 ILGUN K , KEMMERER R A , PORRAS P A . State transition analysis: a rule-based intrusion detection approach[J]. IEEE Transactions on Software Engineering, 1995, 21 (3): 181- 199.
doi: 10.1109/32.372146
18 LEE W, STOLFO S J, MOK K W. A data mining framework for building intrusion detection models[C]//Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, USA: IEEE, 1999: 120-132.
19 LOHIYA R , THAKKAR A . Intrusion detection using deep neural network with antirectifier layer[M]. Singapore: Springer, 2021: 89- 105.
20 LI L H, AHMAD R, TSAI W C, et al. A feature selection based DNN for intrusion detection system[C]//Proceedings of 2021 15th International Conference on Ubiquitous Information Management and Communication(IMCOM). Seoul, Korea: IEEE, 2021: 1-8.
21 FARRAHI S V , AHMADZADEH M . KCMC: a hybrid learning approach for network intrusion detection using K-means clustering and multiple classifiers[J]. International Journal of Computer Applications, 2015, 124 (9): 18- 23.
doi: 10.5120/ijca2015905365
22 PALIWAL S , GUPTA R . Denial-of-service, probing & remote to user (R2L) attack detection using genetic algorithm[J]. International Journal of Computer Applications, 2012, 60 (19): 57- 62.
23 PENG K , LEUNG V , ZHENG L , et al. Intrusion detection system based on decision tree over big data in fog environment[J]. Wireless Communications and Mobile Computing, 2018, 2018 (1): 1- 10.
24 VIMALKUMAR K, RADHIKA N. A big data framework for intrusion detection in smart grids using apache spark[C]//Proceedings of 2017 International Conference on Advances in Computing, Communications and Infor-matics. Udupi, India: IEEE, 2017: 198-204.
25 GUO K, SUI L, QIU J, et al. From model to FPGA: software-hardware co-design for efficient neural network acceleration[C]//Proceedings of 2016 IEEE Hot Chips 28 Symposium (HCS). Cupertino, USA: IEEE, 2016: 1-27.
26 RAJAGOPAL S , KUNDAPUR P P , HAREESHA K S . A stacking ensemble for network intrusion detection using heterogeneous datasets[J]. Security and Communication Networks, 2020, 2020 (1): 1- 9.
doi: 10.1016/S1353-4858(20)30001-5
27 BALAKRISHNAN S , VENKATALAKSHMI K , KANNAN A . Intrusion detection system using feature selection and classification technique[J]. International Journal of Computer Science and Application, 2014, 3 (4): 145- 151.
doi: 10.14355/ijcsa.2014.0304.02
28 ZHANG Y, REN X, ZHANG J. Intrusion detection method based on information gain and ReliefF feature selection[C]// Proceedings of 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary: IEEE, 2019: 1-5.
29 ZHANG J, ZHANG Y, LI K. A network intrusion detection model based on the combination of ReliefF and Borderline-SMOTE[C]//Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence. New York, USA: Association for Computing Machinery, 2020: 199-203.
30 KIRA K, RENDELL L A. The feature selection problem: traditional methods and a new algorithm[C]//Proceedings of the Tenth National Conference on Artificial Intelligence. San Jose, California: AAAI, 1992: 129-134.
31 KONONENKO L . Estimating attributes: analysis and extensions of Relief[J]. Lecture Notes in Computer Science, 1994, 784 (1): 171- 182.
32 马超. 基于ReliefF和改进乌鸦搜索优化的并行入侵检测方法[J]. 计算机应用研究, 2019, 36 (10): 3063- 3068.
MA Chao . Parallel network intrusion detection method based on ReliefF and improved crow search optimization[J]. Application Research of Computers, 2019, 36 (10): 3063- 3068.
33 SUN L , KONG X , XU J , et al. A hybrid gene selection method based on ReliefF and ant colony optimization algorithm for tumor classification[J]. Scientific Reports, 2019, 9 (1): 1- 14.
doi: 10.1038/s41598-018-37186-2
34 BENESTY J , CHEN J , HUANG Y , et al. Pearson correlation coefficient[M]. Berlin, Germany: Springer, 2009: 1- 4.
35 REVATHI S , MALATHI A . A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection[J]. International Journal of Engineering Research & Technology, 2013, 2 (12): 1848- 1853.
36 ROY A, SINGH K J. Multi-classification of UNSW-NB15 dataset for network anomaly detection system[C]//Proceedings of International Conference on Communication and Computational Technologies. Singapore: Springer, 2021: 429-451.
37 张师鹏, 李永忠, 杜祥通. 基于半监督学习和三支决策的入侵检测模型[J]. 计算机应用, 2021, 41 (9): 2602- 2608.
ZHANG Shipeng , LI Yongzhong , DU Xiangtong . Intrusion detection model based on semi-supervised learning and three-way decision[J]. Journal of Computer Applications, 2021, 41 (9): 2602- 2608.
38 吴启睿, 黄树成. 结合卷积神经网络和三支决策的入侵检测算法[J]. 计算机工程与应用, 2022, 58 (13): 119- 127.
WU Qirui , HUANG Shucheng . Intrusion detection algorithm combining convolutional neural network and three-branch decision[J]. Computer Engineering and Applications, 2022, 58 (13): 119- 127.
39 王振东, 张林, 杨书新, 等. 面向入侵检测的Taylor神经网络构建与分析[J/OL]. 计算机科学与探索. (2021-09-09)[2021-11-14]. http://kns.cnki.net/kcms/detail/11.5602.TP.20210909.0906.002.html.
40 朱世松, 巴梦龙, 王辉, 等. 基于NBSR模型的入侵检测技术[J]. 计算机工程与科学, 2020, 42 (3): 427- 433.
ZHU Shisong , BA Menglong , WANG Hui , et al. An intrusion detection technology based on NBSR model[J]. Computer Engineering & Science, 2020, 42 (3): 427- 433.
[1] ZHANG Qinyang, LI Xu, YAO Chunlong, LI Changwu. Aspect-level sentiment classification combined with syntactic dependency information [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 83-89.
[2] Zhuoyu XIAO,Pei HE,Guo CHEN,Yunbiao XU,Jie GUO. Design pattern classification mining with feature metrics constraints [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 48-58.
[3] HUO Bingqiang, ZHOU Tao, LU Huiling, DONG Yali, LIU Shan. Lung tumor benign-malignant classification based on multi-modal residual neural network and NRC algorithm [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 59-67.
[4] MA Changxia, ZHANG Chen. Pre-trained based joint model for intent classification and slot filling in Chinese spoken language understanding [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 68-75.
[5] Yan PENG,Tingting FENG,Jie WANG. An integrated learning approach for O3 mass concentration prediction model [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 1-7.
[6] ZHAO Ningning, TANG Xuesong, ZHAO Mingbo. Depth segment classification algorithm based on convolutional neural network [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 22-27.
[7] Xin MA,Xue WANG. Prediction of microRNA-binding residues based on Laplacian support vector machine and sequence information [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 76-82.
[8] Chao FENG,Kunpeng XU,Lifei CHEN. LDA-based topic feature representation method for symbolic sequences [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 60-65.
[9] Chunyang LI,Nan LI,Tao FENG,Zhuhe WANG,Jingkai MA. Abnormal sound detection of washing machines based on deep learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 108-117.
[10] Shiqi SONG,Yan PIAO,Zexin JIANG. Vehicle classification and tracking for complex scenes based on improved YOLOv3 [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 27-33.
[11] Haijun ZHANG,Yinghui CHEN. Semantic analysis and vectorization for intelligent detection of big data cross-site scripting attacks [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 118-128.
[12] Jiachen WANG,Xianghong TANG,Jianguang LU. Research onfeature selection technology in bearing fault diagnosis [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 80-87, 95.
[13] Mingxia GAO,Jingwei LI. Chinese short text classification method based on word2vec embedding [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 34-41.
[14] Hong CHEN,Xiaofei YANG,Qing WAN,Yingcang MA. Multi-label feature selection algorithm based on correntropy andmanifold learning [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 27-36.
[15] Qingtao QU,Qicheng LIU,Chunxiao MU. A parallel adaptive news topic tracking algorithm based on N-Gram language model [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 37-43.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Xin,LI Shu-cai,LI Shu-chen . Back analysis of initial geostress and its application considering the effect of crude seepage field[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 57 -62 .
[2] SHI Lai-shun,DONG Yan-yan,LI Yan-yan,LI Wen-jing . The catalytic oxidation of simulated wastewater containing eriochrome black T with chlorine dioxide as an oxidant[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 113 -117 .
[3] NIU Lin, ZHAO Jian-Guo, LI Ke-Jun. Study of a power frequency magnetic field of 1000kVUHV AC  transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(1): 154 -158 .
[4] LI Wen-yi,XU Shi-guo,WANG Xing-ju, . Study on the method of analysis and calculation for the constitution of quantity of river water[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 71 -74 .
[5] FENG Xian-Da, LI Shu-Chen, XU Bang-Shu. Numerical simulation study on influence factors of  the seepage volume of  submarine tunnels[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 21 -24 .
[6] WANG Hong-ru1, WANG Zhong-qiu1, 3*, ZHANG Qian2, LI Jian-feng3, SUN Jie3. Flow stress determination of aluminum alloy 7050-T7451 using cutting experiment inverse analysis methods[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(1): 115 -120 .
[7] ZHANG Dao-qiang. Knowledge preserving embedding[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(2): 1 -10 .
[8] FENG Zhi-yu . Study on desurphurization and denitrification of the absorptive catalyst from lignite[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 107 -110 .
[9] ZHANG Xun-hua1, YE Ning2, WANG Hou-li3. Wood CT image registration by Harris corner detector[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 101 -104 .
[10] YU Jiang-de1, ZHAO Hong-dan1, ZHENG Bo-ju1, YU Zheng-tao2. A method of gender discrimination based on character feature of Chinese names[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(1): 13 -18 .