Journal of Shandong University(Engineering Science) ›› 2018, Vol. 48 ›› Issue (6): 44-55.doi: 10.6040/j.issn.1672-3961.0.2018.198

• Machine Learning & Data Mining • Previous Articles     Next Articles

An adaptive ensemble classification method based on deep attribute weighting for data stream

Yao LI(),Zhihai WANG*(),Yan′ge SUN,Wei ZHANG   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Received:2018-05-25 Online:2018-12-20 Published:2018-12-26
  • Contact: Zhihai WANG E-mail:16120396@bjtu.edu.cn;zhhwang@bjtu.edu.cn
  • Supported by:
    北京市自然科学基金(4182052);国家自然科学基金(61672086);国家自然科学基金(61702030);国家自然科学基金(61771058)

Abstract:

Due to most of the existing data stream ensemble classification algorithms without considering the importance of historical data in the evaluation of the base classifier, while ignoring the treatment of interference with irrelevant attributes and noise attributes, an adaptive ensemble classification method based on deep attribute weighting for data stream (EMDAW) was proposed to effectively combine multiple naive Bayesian models based on depth attribute weighting. In different data blocks, the contribution of different attribute values to the attribution of class attributes was deeply analyzed, and the learned local attribute weights to different attribute values were applied to reduce noise data interference. In the evaluation of the base classifier, the importance of the historical data and the current latest data was weighed. The sub-classifier combination was used to improve the overall classification performance by using the combined voting strategy based on the test case classifier confidence and classification correct rate. By comparing experiments with classical algorithms on multiple benchmark datasets, the proposed algorithm had certain advantages in classification correct rate and concept drift adaptability.

Key words: data stream, ensemble classification, deep attribute weighting, concept drift, adaptive

CLC Number: 

  • TP391

Fig.1

Different types of concept drift"

Fig.2

Structure of attribute weighted na?ve Bayes"

Fig.3

Algorithm framework of EMDAW"

Table 1

The characters of several datasets"

数据集 实例数 属性数目 类标数 噪声比例/% 漂移数 漂移类型
HYP 1 000 000 10 2 5 1 增量式漂移
SEA 1 000 000 3 4 10 9 突变漂移
LEDM 1 000 000 24 10 10 3 混合漂移
LEDND 1 000 000 24 10 20 0
Cover type 581 000 53 7 未知
Electricity 45 000 7 2 未知
Poker 1 000 000 10 10 未知
Spam 9 342 500 2 未知

Table 2

Comparison of classification correct rates of several base classifiers"

%
分类器模型 HYP SEAF LEDm LEDnd Cover type Electricity Poker Spam
DAW 72.34 84.34 67.44 51.57 82.36 79.12 83.44 84.25
NB 77.48 84.86 67.14 51.27 66.04 77.88 59.46 80.25
HOT 75.46 85.78 67.22 51.13 74.93 77.12 83.36 78.09

Fig.4

The classification correct rate of different algorithms under different number of ensemble classifiers"

Fig.5

Average classification correct rate of the proposed algorithm under different parameters"

Table 3

The classification correct rate of different datasets with different data chunks"

%
数据集 数据块大小
500 750 1 000 1 250 1 500 1 750 2 000
HYP 84.27 85.39 85.97 86.19 86.29 86.60 86.59
SEAF 82.80 83.56 84.25 84.95 85.19 85.44 85.53
Electricity 77.14 77.27 79.33 78.69 78.12 78.50 78.51
Cover type 83.47 83.93 84.25 81.76 81.58 81.03 79.15

Table 4

The classification correct rate of each data set under different values ofparameter k with the ensemble strategy"

%
数据集 50 100 150 200
HYP 84.33 85.97 85.50 85.52
SEAF 83.80 84.71 84.83 84.33
Electricity 77.35 79.35 78.53 78.62
Cover type 84.04 84.25 84.02 83.84

Table 5

Average chunk training time of different classification algorithms"

ms
数据集 AWE AUE2 DDM NB Oza DWM NSE EMDAW
HYP 239.1 156.2 1 333.6 0.2 104.5 4 384.2 331.6 460.4
SEAF 87.0 42.1 32.1 0.1 378.8 1 246.2 262.5 358.1
LEDM 230.1 150.3 101.3 0.2 124.6 108.6 534.6 125.1
LEDND 230.6 150.6 120.2 0.2 132.6 120.5 834.6 142.3
Cover type 296.6 133.2 349.4 0.8 447.3 63.5 640.0 260.9
Electricity 290.1 180.6 85.3 0.4 408.8 40.2 173.3 318.6
Poker 173.5 155.7 42.8 0.2 314.6 49.2 762.8 544.4
Spam 750.6 669.5 191.6 3.6 810.5 24.6 152.2 197.9

Table 6

Average classification correct rate of different classification algorithms"

%
数据集 AWE AUE2 DDM NB Oza DWM NSE EMDAW
HYP 82.45 83.54 76.54 77.48 83.05 82.93 84.43 85.97
SEAF 84.05 86.77 84.95 84.86 85.04 85.38 83.01 84.71
LEDM 67.08 67.58 66.70 67.14 67.55 67.12 62.86 67.21
LEDND 51.27 51.26 51.18 51.27 51.23 51.26 47.16 50.57
Cover type 81.70 84.05 74.36 66.04 80.52 77.29 79.70 84.25
Electricity 77.67 78.21 76.25 77.88 77.34 76.69 76.70 79.35
Poker 53.87 66.88 62.14 59.46 65.19 60.72 53.73 62.60
Spam 74.86 72.23 77.25 80.25 78.92 80.26 68.78 81.37

Fig.6

The algorithms average classification correct rate ofdifferent chunksize"

Fig.7

The classification correct rate on the SEA dataset"

Fig.8

The classification correct rate on the HYP dataset"

Fig.9

The classification correct rate on the Electricity dataset"

Fig.10

The classification correct rate on the LEDm dataset"

1 GAMA J , ŽLIOBAITE I , BIFET A , et al. A survey on concept drift adaptation[J]. ACM Computing Surveys (CSUR), 2014, 46 (4): 44.
2 DIETTERICH T G. Ensemble methods in machine learning[C]//Proceedings of the International Workshop on Multiple Classifier Systems. New York, USA: ACM, 2000: 1-15.
3 TSYMBAL A. The problem of concept drift: definitions and related work[R]. Dublin, Ireland, Trinity College, 2004.
4 WEBB G I , HYDE R , CAO H , et al. Characterizing concept drift[J]. Data Mining and Knowledge Discovery, 2016, 30 (4): 964- 994.
5 亓开元, 赵卓峰, 房俊, 等. 针对高速数据流的大规模数据实时处理方法[J]. 计算机学报, 2012, 35 (3): 477- 490.
QI Kaiyuan , ZHAO Zhuofeng , FANG Jun , et al. Real-time processing for high speed data stream over lame scale data[J]. Chinese Journal of Computers, 2012, 35 (3): 477- 490.
6 GAMA J . Knowledge discovery from data streams[M]. Florida, USA: CRC Press, 2010.
7 WANG Haixun, WEI Fan, YU P S, et al. Mining concept-drifting data streams using ensemble classifiers[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2003: 226-235.
8 HOMAYOUN S , AHMADZADEH M . A review on data stream classification approaches[J]. Journal of Advanced Computer Science & Technology, 2016, 5 (1): 8- 13.
9 STREET W N, KIM Y S. A streaming ensemble algorithm (sea) for large-scale classification[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2001: 377-382.
10 SUN Yu , TANG Ke , MINKU L L , et al. Online ensemble learning of data streams with gradually evolved classes[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28 (6): 1532- 1545.
doi: 10.1109/TKDE.2016.2526675
11 BRZEZINSKI D , STEFANOWSKJ J . Reacting to different types of concept drift: The accuracy updated ensemble algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25 (1): 81- 94.
doi: 10.1109/TNNLS.2013.2251352
12 BIFET A, HOLMES G, PFAHRINGER B, et al. New ensemble methods for evolving data streams[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. New York, USA: ACM, 2009: 139-148.
13 FREUND Y , SCHAPIRE R E . A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55 (1): 119- 139.
doi: 10.1006/jcss.1997.1504
14 ELWELL R , POLIKAR R . Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22 (10): 1517- 1531.
doi: 10.1109/TNN.2011.2160459
15 桂林, 张玉红, 胡学钢. 一种基于混合集成方法的数据流概念漂移检测方法[J]. 计算机科学, 2012, 39 (1): 152- 155.
doi: 10.3969/j.issn.1002-137X.2012.01.034
GUI Lin , ZHANG Yuhong , HU Xuegang . Data stream concept drift detection method based on mixture ensemble method[J]. Computer Science, 2012, 39 (1): 152- 155.
doi: 10.3969/j.issn.1002-137X.2012.01.034
16 赵强利, 蒋艳凰, 卢宇彤. 具有回忆和遗忘机制的数据流挖掘模型与算法[J]. 软件学报, 2015, 26 (10): 2567- 2580.
ZHAO Qiangli , JIANG Yanhuang , LU Yutong . Ensemble model and algorithm with recalling and forgetting mechanism for data stream mining[J]. Journal of Software, 2015, 26 (10): 2567- 2580.
17 WANG S K, DAI B R. A g-means update ensemble learning approach for the imbalanced data stream with concept drifts[C]//International Conference on Big Data Analytics and Knowledge Discovery. Berlin, Germany: Springer, 2016: 255-266.
18 SUN YU , TANG KE , ZHU ZEXUAN , et al. Concept drift adaptation by exploiting historical knowledge[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 1- 10.
19 ZHANG H, SHENG Shengli. Learning weighted naive bayes with accurate ranking[C]//Proceedings of the fourth International Conference on Data Mining. New Jersey, USA: IEEE, 2004: 567-570.
20 HALL M . A decision tree-based attribute weighting filter for naive Bayes[J]. Knowledge-Based Systems, 2007, 20 (2): 120- 126.
21 JIANG Liangxiao , LI Chaoqun , WANG Shasha , et al. Deep feature weighting for naive bayes and its application to text classification[J]. Engineering Applications of Artificial Intelligence, 2016, 52, 26- 39.
doi: 10.1016/j.engappai.2016.02.002
22 GROSSMAN D, DOMINGOS P. Learning bayesian network classifiers by maximizing conditional likelihood[C]//Proceedings of the twenty-first International Conference on Machine learning. New York, USA: ACM, 2004.
23 ZHU Ciyou , BYRD R H , LU Peihuang , et al. Algorithm 778: l-bfgs-b: fortran subroutines for large-scale bound-constrained optimization[J]. ACM Transactions on Mathematical Software, 1997, 23 (4): 550- 560.
doi: 10.1145/279232.279236
24 SONG Ge , YE Yunming , ZHANG Haijun , et al. Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift[J]. Information Sciences, 2016, 357, 125- 143.
doi: 10.1016/j.ins.2016.03.043
25 PIETRUCZUK L , RUTKOWSKI L , JAWORSKI M , et al. How to adjust an ensemble size in stream data mining[J]. Information Sciences, 2017, 381, 46- 54.
doi: 10.1016/j.ins.2016.10.028
26 BIFET A , HOLMES G , KIRKBY R , et al. Moa: massive online analysis[J]. Journal of Machine Learning Research, 2010, 11 (50): 1601- 1604.
27 OZA N C , RUSSELL S . Online ensemble learning[M]. Berkeley, USA: University of California, 2001.
28 KOLTER J Z , MALOOF M A . Dynamic weighted majority: an ensemble method for drifting concepts[J]. Journal of Machine Learning Research, 2007, (8): 2755- 2790.
29 HULTEN G, SPENCER L, DOMINGOS P. Mining time-changing data streams[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2001: 97-106.
[1] ZHOU Qian, LI Qun, ZHU Dandan, LI Yibo. Coordinated inertia response control for offshore low frequency wind power system based on adaptive virtual inertia of M3C [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 30-39.
[2] LI Xiaohui, LIU Xiaofei, SUN Weitong, ZHAO Yi, DONG Yuan, JIN Yinli. An inspection task assignment and path planning algorithm based on vehicles-UAVs collaboration [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 101-109.
[3] ZHENG Xiao, CHEN He, ZHOU Dongao, GONG Yongshun. Video anomaly detection method based on video caption augmentation and dual-stream feature fusion [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 110-119.
[4] GAO Junjian, LIAO Zhuhua, LIU Yizhi, ZHAO Yijiang. Hierarchical multi-agent reinforcement learning based route guidance method combining personalization and signal control [J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 34-45.
[5] ZHOU Yanbing, MA Shilun, WEN Yimin. Concept drift detection based on graph structure [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 88-96.
[6] Haigen MIN,Yukun FANG,Xia WU,Wuqi WANG. Fault diagnosis of vehicle-to-vehicle communication in networked traffic environment [J]. Journal of Shandong University(Engineering Science), 2021, 51(6): 84-92.
[7] YANG Xiuyuan, PENG Tao, YANG Liang, LIN Hongfei. Adaptive multi-domain sentiment analysis based on knowledge distillation [J]. Journal of Shandong University(Engineering Science), 2021, 51(3): 15-21.
[8] LIANG Qixing, LI Bin, LI Zhi, ZHANG Hui, RONG Xuewen, FAN Yong. Algorithm of adaptive slope adjustment of quadruped robot based on model predictive control and its application [J]. Journal of Shandong University(Engineering Science), 2021, 51(3): 37-44.
[9] ZHOU Kaiqing, LI Hangcheng, MO Liping. Adaptive harmony search algorithm based on global optimization [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 47-56.
[10] Chunrui CHENG,Beixing MAO. Adaptive sliding mode synchronization of a class of nonlinear chaotic systems [J]. Journal of Shandong University(Engineering Science), 2020, 50(5): 1-6.
[11] WANG Chunyan, DI Jinhong, MAO Beixing. Sliding mode synchronization of fractional-order Rucklidge systems with unknown parameters based on a new type of reaching law [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 40-45.
[12] Baocheng LIU,Yan PIAO,Xuemei SONG. Adaptive fusion target tracking based on joint detection [J]. Journal of Shandong University(Engineering Science), 2020, 50(3): 51-57.
[13] Wei YAN,Damin ZHANG,Huijuan ZHANG,Ziyun XI,Zhongyun CHEN. Improved bird swarm algorithms based on mixed decision making [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 34-43.
[14] Shengnan ZHANG,Lei WANG,Chunhong CHANG,Benli HAO. Image denoising based on 3D shearlet transform and BM4D [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 83-90.
[15] Jialin SU,Yuanzhuo WANG,Xiaolong JIN,Xueqi CHENG. Entity alignment method based on adaptive attribute selection [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 14-20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[2] ZOU Feifei,GUAN Xiaojun,HAN Zhenqiang,SHEN Xiaomin,MA Xiaofei ,LIU Yunteng . hermal simulating experiment and FEM simulation of dynamic recrystallization of 09CuPTiRE steel[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(5): 17 -20 .
[3] BO De-Yun, ZHANG Dao-Jiang. Adaptive spectral clustering algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 22 -26 .
[4] YU Hai-bo,LI Yu,YU Tian,LEI Hong . Influence of the dimensions of W-band folded waveguide slow-wave system on its cold characteristics[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 90 -94 .
[5] WANG Ru-gui,CAI Gan-wei . Sub-harmonic resonance analysis of 2-DOF controllable plane linkage mechanism electromechanical coupling system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 58 -63 .
[6] XUE Cheng-qian,DONG Jian-wen,MENG Xian-feng,CHANG Hong,CAO Ning,CHEN Hua-ying,LI Mu-sen . The effect of C/C+HA bonerepairing material to the physiological and biochemical response of the crossed Boer Goat[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 73 -76 .
[7] SUN Yuan-Yuan, XU Yan-Liang, TAO Zhi-Ning. Analysis and calculation of the braking force for a side magnetism brake single phase induction motor[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 120 -123 .
[8] DONG Cheng-xi,WU De-wei,HE Jing . Combat effectiveness evaluation method of satellite navigation
system based on rough fuzzy sets
[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 32 -36 .
[9] . The magnetic glass state in the magnetocaloric material Gd5Ge4[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 67 -70 .
[10] YAN Chong-jing, LIAO Wen-he, GUO Yu, CHENG Xiao-sheng. The BOM modeling based on the polychromatic graph[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 70 -75 .