JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2011, Vol. 41 ›› Issue (3): 7-11.

• Articles • Previous Articles     Next Articles

Ensemble learning based feature selection for imbalanced problems

LI Xia1, WANG Lian-xi2, JIANG Sheng-yi1   

  1. 1. School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510006, China;
    2. Department of Business and Trade, Guangdong Vocational College of Science and Trade, Guangzhou 510640, China
  • Received:2011-02-01 Online:2011-06-16 Published:2011-02-01

Abstract:

The traditional feature selection methods are basically aimed for getting the optimal accuracy without full consideration of the data distribution, which can not achieve promising results on imbalanced datasets. A new feature selection method was proposed based on the data distribution modification  for imbalanced data sets. This approach could modify data distribution  many times by sampling with replacement. The instances of large classes were equal to the minor class samples in each new dataset. Finally, the final selected features were generated by voting mechanism for ensemble learning, which could combine the selected features by receiving more votes   than half from all the new training datasets. Experimental results on several UCI datasets showed that the proposed method was an effective feature selection approach for imbalance problems.
 

Key words: imbalanced data, feature selection, ensemble learning, sampling

[1] MOU Lianming. Weighted k sub-convex-hull classifier based on adaptive feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(5): 32-37.
[2] SHEN Dongdong, ZHOU Fengyu, LI Mengyuan, WANG Shuqian, GUO Renhe. Indoor wireless positioning based on ensemble deep neural network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(5): 95-102.
[3] ZHANG Pu, LIU Chang, WANG Yong. Suggestion sentence classification model based on feature fusion and ensemble learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(5): 47-54.
[4] ZHAO Yanan, WANG Xinfeng, LI Rui, CHEN Tianshu, XUE Likun, WANG Wenxing. Tests and comparison of the dehumidification effectiveness of drying techniques involving in atmospheric sampling [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(4): 128-136.
[5] WANG Huan, ZHOU Zhongmei. An over sampling algorithm based on clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 134-139.
[6] LI Sushu, WANG Shitong, LI Tao. A feature selection method based on LS-SVM and fuzzy supplementary criterion [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 34-42.
[7] FANG Hao, LI Yun. Random undersampling and POSS method for software defect prediction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 15-21.
[8] MO Xiaoyong, PAN Zhisong, QIU Junyang, YU Yajun, JIANG Mingchu. Anomaly detection in network traffic based on online feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 21-27.
[9] WANG Lihong, LI Qiang. A selective ensemble method for traveling salesman problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 42-48.
[10] LU Shuxia, LI Limin. The weighted maximum vector angular margin core vector machinefor imbalanced data classification [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(3): 1-7.
[11] CHEN Dawei, YAN Zhao*, LIU Haoyan. Overfitting phenomenon  of  the series of single value decomposition algorithms in rating prediction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(3): 15-21.
[12] WEI Xiaomin, XU Bin, GUAN Jihong. Prediction of protein energy hot spots based on recursion feature elimination [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 12-20.
[13] FANG Xiao-nan1,2, ZHANG Hua-xiang1,2*, GAO Shuang1,2. Web spam detection based on SMOTE and random forests [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(1): 22-27.
[14] ZHANG Ling-wei, WAN Wen-qiang. Study on the cost-sensitive ensemble learning algorithm based on the cloud computing platform [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(4): 19-23.
[15] XIE Huo-sheng, LIU Min. An ensemble co-training algorithm based on active learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(3): 1-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!