JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2017, Vol. 47 ›› Issue (1): 15-21.doi: 10.6040/j.issn.1672-3961.0.2016.304

Previous Articles     Next Articles

Random undersampling and POSS method for software defect prediction

FANG Hao, LI Yun*   

  1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China
  • Received:2016-07-22 Online:2017-02-20 Published:2016-07-22

Abstract: In order to solve the problem of imbalance distribution in software defect prediction, POSS(pareto optimization for subset selection)feature selection and random undersampling was applied in this paper, and SVM was used to build the prediction model. The experimental results showed that the problem could be solved effectively by using multiple random undersampling, and the POSS method was treated subset selection as a bi-objective optimization, which could improve the accuracy of classification, the effectiveness of proposed method was verified by comparing with Relief、Fisher、MI(mutual information).

Key words: class imbalance, data sampling, feature selection, software defect prediction

CLC Number: 

  • TP391
[1] SONG Q, JIA Z, SHEPPERD M, et al. A general software defect-proneness prediction framework[J].IEEE Transactions on Software Engineering, 2011, 37(3):356-370.
[2] MUNSON J C, KHOSHGOFTAAR T M. Regression modelling of software quality: empirical investigation[J]. Information and Software Technology, 1990, 32(2):106-114.
[3] ZHENG J. Cost-sensitive boosting neural networks for software defect prediction[J]. Expert Systems with Applications, 2010, 37(6):4537-4543.
[4] KHOSHGOFTAAR T M, SELIYA N. Analogy-based practical classification rules for software quality estimation[J].Empirical Software Engineering, 2003, 8(4):325-350.
[5] CHIDAMBER S R, KEMERER C F. A metrics suite for object oriented design[J]. IEEE Transactions on Software Engineering, 1994, 20(6):476-493.
[6] KHOSHGOFTAAR T M, GAO K, NAPOLITANO A. An empirical study of feature ranking techniques for software quality prediction[J].International Journal of Software Engineering and Knowledge Engineering, 2012, 22(2):161-183.
[7] GAO K, KHOSHGOFTAAR T M, WANG H, et al. Choosing software metrics for defect prediction: an investigation on feature selection techniques[J]. Software: Practice and Experience, 2011, 41(5): 579-606.
[8] KHOSHGOFTAAR T M, GAO K, NAPOLITANO A, et al. A comparative study of iterative and non-iterative feature selection techniques for software defect prediction[J]. Information Systems Frontiers, 2014, 16(5): 801-822.
[9] BOEHM B W, PAPCCIO P N. Understanding and controlling software costs[J].IEEE Transactions on Software Engineering, 1998, 14(10):1462-1477.
[10] 姚旭,王晓丹,张玉玺.特征选择综述[J].控制与决策,2012,27(2):161-166. YAO Xu, WANG Xiaodan, ZHANG Yuxi. Survey of feature selection methods[J]. Control and Decision, 2012, 27(2):161-166.
[11] GU Q, LI Z, HAN J. Generalized fisher score for feature selection[C] // Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011.Barcelona, Spain:AUAI Press, 2011:266-273.
[12] ROBNIK-SIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and RReliefF[J]. Machine Learning, 2003, 53(1-2):23-69.
[13] GUYON I, WESTON J, BARNHILL S, et al. Gene selection for cancer classification using support vector machines[J]. Machine Learning, 2002, 46(1-3):389-422.
[14] LIU H, YU L. Toward integrating feature selection algorithms for classification and clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4):491-502.
[15] WOZNICA A, NGUYEN P, KALOUSIS A. Model mining for robust feature selection[C] //Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China:ACM, 2012: 913-921.
[16] JONG K, MARCHIORI E, SEBAG M, et al. Feature selection in proteomic pattern data with support vector machines[C] //Proceedings of the 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology. La Jolla, USA:IEEE, 2004:41-48.
[17] RODRIGUEZ D, RUIZ R, CUADRADO-GALLEGO J, et al. Detecting fault modules applying feature selection to classifiers[C] //Proceedings of the 2007 IEEE International Conference on Information Reuse and Integration.Las Vegas, USA:IEEE, 2007: 667-672.
[18] FORMAN G. An extensive empirical study of feature selection metrics for text classification[J]. Journal of Machine Learning Research, 2003, 3(2):1289-1305.
[19] QIAN C, YU Y, ZHOU Z H. Subset Selection by Pareto Optimization[C] //Proceedings of the Advances in Neural Information Processing Systems 28(NIPS 2015).Montreal, Canada:NIPS, 2015:1774-1782.
[20] 徐燕, 李锦涛, 王斌,等. 基于区分类别能力的高性能特征选择方法[J]. 软件学报, 2008, 19(1):82-89. XU Yan, LI Jintao, WANG Bin, et al. A high performance feature selection method based on classification[J].Journal of Software, 2008, 19(1):82-89.
[21] 马衍庆. 基于机器学习的网络流量识别方法与实现[D]. 济南:山东大学, 2014. MA Yanqing. Internet traffic classification and identification based on machine learning[D]. Jinan: Shandong University, 2014.
[22] YU Y, YAO X, ZHOU ZH. On the approximation ability of evolutionary optimization with application to minimum set cover[J].Artificial Intelligence, 2012, 180-181(2):20-33.
[23] HUANG Y J, POWERS R, MONTELIONE G T. Protein NMR recall, precision, and F-measure scores(RPF scores): structure quality assessment measures based on information retrieval statistics[J]. Journal of the American Chemical Society, 2005, 127(6): 1665-1674.
[24] ZHAO Z, GUO S, XU Q, et al. G-means: a clustering algorithm for intrusion detection[C] //Proceedings of the Lecture Notes in Computer Science(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). [S.l.] :Springer, 2009, 5506:563-570.
[25] WANG S, YAO X. Using class imbalance learning for software defect prediction[J]. IEEE Transactions on Reliability, 2013, 62(2):434-443.
[26] METZ C E. Basic principles of ROC analysis[J].Seminars in Nuclear Medicine, 1978, 8(4):283-298.
[27] KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one-sided selection[C] //Proceedings of the Fourteenth International Conference on Machine Learning.Stanford, USA:ICML, 2000:179-186.
[1] MOU Lianming. Weighted k sub-convex-hull classifier based on adaptive feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(5): 32-37.
[2] LI Sushu, WANG Shitong, LI Tao. A feature selection method based on LS-SVM and fuzzy supplementary criterion [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 34-42.
[3] MO Xiaoyong, PAN Zhisong, QIU Junyang, YU Yajun, JIANG Mingchu. Anomaly detection in network traffic based on online feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 21-27.
[4] WEI Xiaomin, XU Bin, GUAN Jihong. Prediction of protein energy hot spots based on recursion feature elimination [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 12-20.
[5] PAN Dong-yin, ZHU Fa, XU Sheng, YE Ning*. Feature selection of gene expression profiles of colon cancer [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(2): 23-29.
[6] LI Guo-he1,2, YUE Xiang1,2, LI Xue3, WU Wei-jiang1,2, LI Hong-qi1. A method of feature selection for continuous attributes [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(6): 1-6.
[7] LI Xia1, WANG Lian-xi2, JIANG Sheng-yi1. Ensemble learning based feature selection for imbalanced problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 7-11.
[8] DAI Ping, LI Ning*. A fast SVM-based feature selection method [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 60-65.
[9] TAN Tai-zhe, LIANG Ying-yi, LIU Fu-chun. Application of ReliefF feature evaluation in un-supervised manifold learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 66-71.
[10] YOU Ming-yu, CHEN Yan, LI Guo-zheng. Im-IG: A novel feature selection method for imbalanced problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 123-128.
[11] YANG Ai-min1, ZHOU Yong-mei1, DENG He2, ZHOU Jian-feng3. Method of feature generation and selection for network traffic classification [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 1-7.
[12] WANG Fa-bo, XU Xin-shun. A new feature selection method for text categorization [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(4): 8-11.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!