Journal of Shandong University(Engineering Science) ›› 2019, Vol. 49 ›› Issue (4): 8-13.doi: 10.6040/j.issn.1672-3961.0.2019.050

• Machine Learning & Data Mining • Previous Articles     Next Articles

An ensemble learning algorithm for unbalanced data classification

Zongtang ZHANG1(),Sen WANG2,Shilin SUN1   

  1. 1. Navigation and Observation Department, Navy Submarine Academy, Qingdao 266000, Shandong, China
    2. 91154 force, Sanya 572000, Hainan, China
  • Received:2019-01-30 Online:2019-08-20 Published:2019-08-06

Abstract:

For unbalanced data classification problem in underwater acoustic target recognition, a random subspace AdaBoost algorithm called RSBoost was proposed. Subtraining sample set was extracted by random subspace method in different underwater acoustic feature space and base classifier was trained in every subtraining sample set. The base classifier with the maximum margin mean of minority class was chosen as the base classifier of this round, the final ensemble classifier was formed iteratively. The experiment was carried out on the measured data, the performance of RSBoost and AdaBoost in different feature space was evaluated by F-measure and G-mean. The results showed that, compared with AdaBoost, the F-measure of RSBoost improved from 0.07 to 0.22 and the G-mean improved from 0.18 to 0.25, which showed that RSBoost was superior to AdaBoost in underwater acoustic unbalanced classification problem.

Key words: unbalanced data, ensemble learning, underwater acoustic target recognition, AdaBoost algorithm, random space

CLC Number: 

  • TP391

Fig.1

Flow chart of RSBoost algorithm"

Table 1

Confusion matrix for bi-class problem"

类别 预测少类 预测多类
实际少类 TP FN
实际多类 FP TN

Table 2

Feature dimension of data"

特征 Demon谱 功率谱 高阶谱 MFCC 小波
维度 512 14 19 36 16

Fig.2

F-measure of two algorithms"

Fig.3

G-mean of two algorithms"

Table 3

Performance of two algorithm in different feature space"

特征集 F-measure G-mean
RSBoost AdaBoost RSBoost AdaBoost
Demon 0.30 0.14 0.22 0.29
功率谱 0.18 0.06 0.26 0.17
高阶谱 0.18 0.04 0.18 0.11
MFCC 0.22 0.05 0.31 0.14
小波 0.21 0.07 0.26 0.18
平均 0.22 0.07 0.25 0.18
1 DAI H L . Class imbalance learning via a fuuzy total margin based support vector machine[J]. Applied Soft Computing, 2015, 31, 172- 184.
doi: 10.1016/j.asoc.2015.02.025
2 WANG S , YAO X . Using class imbalance learning for software defect prediction[J]. IEEE Trans on Reliability, 2013, 62 (2): 434- 443.
doi: 10.1109/TR.2013.2259203
3 OZCIFT A , GULTEN A . Classifer ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms[J]. Computer Methods Programs Biomedicine, 2011, 104 (3): 443- 451.
doi: 10.1016/j.cmpb.2011.03.018
4 KUBAT M, MATWIN S. Addressing the curse of imbalanced trainingsets: one-sided selection[C]//Proceedings of the Fourteenth International Conference on Machine Learning. New York, USA: Morgan Kaufmann, 1997: 179-186.
5 CHAWLA N , BOWYER K , HALL L , et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16, 321- 357.
doi: 10.1613/jair.953
6 张伶卫, 万文强. 基于云计算平台的代价敏感集成学习算法研究[J]. 山东大学学报(工学版), 2012, 42 (4): 19- 28.
ZHANG Lingwei , WAN Wenqiang . Study on the cost-sensitive ensemble learning algorithm based on the cloud computing platform[J]. Journal of Shandong University(Engineering Science), 2012, 42 (4): 19- 28.
7 MANEVITZ L M , YOUSEFS M . One-class SVMs for document classification[J]. Journal of Machine Learning Research, 2001, 2, 139- 154.
8 李雄飞, 李军, 董元方, 等. 一种新的不平衡数据学习算法PCBoost[J]. 计算机学报, 2012, 35 (2): 2202- 2209.
LI Xiongfei , LI Jun , DONG Yuanfang . A new learning algorithm for imbalanced data-PCBoost[J]. Chinese Journal of Computers, 2012, 35 (2): 2202- 2209.
9 BAIG M , AWAIS M , EL-ALFY Esm . AdaBoost-based artificial neural network learning[J]. Neurocomputing, 2017, 248 (1): 120- 126.
10 RCHAPIRE R , FREUND Y , BARLETT Y , et al. Boosting the margin: a new explanation for the effectiveness of voting methods[J]. The Annals of Statistics, 1998, 26 (5): 1651- 1686.
doi: 10.1214/aos/1024691352
11 ZHOU Z H , WU J , TANG W . Ensembling neural networks: Many could be better than all[J]. Artificial Intelligence, 2002, 137 (1-2): 239- 263.
doi: 10.1016/S0004-3702(02)00190-X
12 HO T K . The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20 (8): 832- 844.
doi: 10.1109/34.709601
13 GAO Wei , ZHOU Zhihua . On the doubt about margin explanation of boosting[J]. Artificial Intelligence, 2013, 203, 1- 18.
doi: 10.1016/j.artint.2013.07.002
[1] Pu ZHANG,Chang LIU,Yong WANG. Suggestion sentence classification model based on feature fusion and ensemble learning [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 47-54.
[2] Dongdong SHEN,Fengyu ZHOU,Mengyuan LI,Shuqian WANG,Renhe GUO. Indoor wireless positioning based on ensemble deep neural network [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 95-102.
[3] WANG Lihong, LI Qiang. A selective ensemble method for traveling salesman problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 42-48.
[4] CHEN Dawei, YAN Zhao*, LIU Haoyan. Overfitting phenomenon  of  the series of single value decomposition algorithms in rating prediction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(3): 15-21.
[5] JIANG Weijian1,2, GUO Gongde1,2*, LAI Zhiming1,2. An improved adaboost algorithm based on new Haar-like feature for face detection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 43-48.
[6] LI Xiang1, ZHU Quan-yin1, WANG Zun2. Research of wavelet neural network based on variable basis functions and GentleAdaBoost algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(5): 31-38.
[7] FANG Xiao-nan1,2, ZHANG Hua-xiang1,2*, GAO Shuang1,2. Web spam detection based on SMOTE and random forests [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(1): 22-27.
[8] ZHU Hong-jin1, FAN Hong-hui1, CHEN Xing-rui1, TAMURA-Yasutaka2. Image normalization based on local autocorrelation and its application to face detection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(5): 59-64.
[9] ZHANG Ling-wei, WAN Wen-qiang. Study on the cost-sensitive ensemble learning algorithm based on the cloud computing platform [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(4): 19-23.
[10] XIE Huo-sheng, LIU Min. An ensemble co-training algorithm based on active learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(3): 1-5.
[11] LI Xiao-bin1, LI Shi-yin2. Ensemble learning of multi-classifier for early classification of time series [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 73-78.
[12] LI Xia1, WANG Lian-xi2, JIANG Sheng-yi1. Ensemble learning based feature selection for imbalanced problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 7-11.
[13] CHEN Jintan1, 2, KANG Hengzheng3*, YANG Yan3, ZHOU Weixiong 4. A classification method for class-imbalanced data [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(2): 96-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[2] LIU Wen-liang, ZHU Wei-hong, CHEN Di, ZHANG Hong-quan. Detection and tracking of moving targets using the morphology match in radar images[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 31 -36 .
[3] ZHANG Ying,LANG Yongmei,ZHAO Yuxiao,ZHANG Jianda,QIAO Peng,LI Shanping . Research on technique of aerobic granular sludge cultivationby seeding EGSB anaerobic granular sludge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 56 -59 .
[4] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[5] HAO Ranhang,CHEN Shouyu . The theory, model and method of water resources evaluationombining quantity with quality[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(3): 46 -50 .
[6] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[7] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 104 -107 .
[8] WANG Shan,LI Tian-ze . A new method for the control of a wound-rotor induction machine[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 86 -89 .
[9] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 131 -136 .
[10] LIANG Jing-yun,WANG Ming-gang,CHAI Jia-qian,LIU yong-qing . Synthesis and in vitro antibacterial activity of 1,6-Di-(N5-phenyl-N1-diguanido) hexane dihydrochloride[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 104 -107 .