山东大学学报 (工学版) ›› 2018, Vol. 48 ›› Issue (6): 44-55.doi: 10.6040/j.issn.1672-3961.0.2018.198
Yao LI(),Zhihai WANG*(),Yan′ge SUN,Wei ZHANG
摘要:
针对现有的大多数数据流集成分类算法对分类器的评估时未考虑历史数据的重要性,同时忽略对无关属性和噪声属性干扰的处理等问题,提出一种基于深度属性加权的数据流自适应集成分类算法,旨在有效组合多个基于深度属性加权的朴素贝叶斯模型。通过在不同数据块中深入分析不同属性取值对类属性归属的贡献,并将学习到的局部属性权重作用于不同的属性取值,以降低噪声数据干扰。在评价基分类器时,权衡历史数据和当前最新数据的重要性;采用基于测试实例的分类器置信度和分类正确率权重的组合投票策略进行子分类器组合以提高整体分类性能。通过在多个基准数据集上与经典算法对比试验,本研究算法在分类正确率和概念漂移适应性上具有一定优势。
中图分类号:
1 | GAMA J , ŽLIOBAITE I , BIFET A , et al. A survey on concept drift adaptation[J]. ACM Computing Surveys (CSUR), 2014, 46 (4): 44. |
2 | DIETTERICH T G. Ensemble methods in machine learning[C]//Proceedings of the International Workshop on Multiple Classifier Systems. New York, USA: ACM, 2000: 1-15. |
3 | TSYMBAL A. The problem of concept drift: definitions and related work[R]. Dublin, Ireland, Trinity College, 2004. |
4 | WEBB G I , HYDE R , CAO H , et al. Characterizing concept drift[J]. Data Mining and Knowledge Discovery, 2016, 30 (4): 964- 994. |
5 | 亓开元, 赵卓峰, 房俊, 等. 针对高速数据流的大规模数据实时处理方法[J]. 计算机学报, 2012, 35 (3): 477- 490. |
QI Kaiyuan , ZHAO Zhuofeng , FANG Jun , et al. Real-time processing for high speed data stream over lame scale data[J]. Chinese Journal of Computers, 2012, 35 (3): 477- 490. | |
6 | GAMA J . Knowledge discovery from data streams[M]. Florida, USA: CRC Press, 2010. |
7 | WANG Haixun, WEI Fan, YU P S, et al. Mining concept-drifting data streams using ensemble classifiers[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2003: 226-235. |
8 | HOMAYOUN S , AHMADZADEH M . A review on data stream classification approaches[J]. Journal of Advanced Computer Science & Technology, 2016, 5 (1): 8- 13. |
9 | STREET W N, KIM Y S. A streaming ensemble algorithm (sea) for large-scale classification[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2001: 377-382. |
10 |
SUN Yu , TANG Ke , MINKU L L , et al. Online ensemble learning of data streams with gradually evolved classes[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28 (6): 1532- 1545.
doi: 10.1109/TKDE.2016.2526675 |
11 |
BRZEZINSKI D , STEFANOWSKJ J . Reacting to different types of concept drift: The accuracy updated ensemble algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25 (1): 81- 94.
doi: 10.1109/TNNLS.2013.2251352 |
12 | BIFET A, HOLMES G, PFAHRINGER B, et al. New ensemble methods for evolving data streams[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. New York, USA: ACM, 2009: 139-148. |
13 |
FREUND Y , SCHAPIRE R E . A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55 (1): 119- 139.
doi: 10.1006/jcss.1997.1504 |
14 |
ELWELL R , POLIKAR R . Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22 (10): 1517- 1531.
doi: 10.1109/TNN.2011.2160459 |
15 |
桂林, 张玉红, 胡学钢. 一种基于混合集成方法的数据流概念漂移检测方法[J]. 计算机科学, 2012, 39 (1): 152- 155.
doi: 10.3969/j.issn.1002-137X.2012.01.034 |
GUI Lin , ZHANG Yuhong , HU Xuegang . Data stream concept drift detection method based on mixture ensemble method[J]. Computer Science, 2012, 39 (1): 152- 155.
doi: 10.3969/j.issn.1002-137X.2012.01.034 |
|
16 | 赵强利, 蒋艳凰, 卢宇彤. 具有回忆和遗忘机制的数据流挖掘模型与算法[J]. 软件学报, 2015, 26 (10): 2567- 2580. |
ZHAO Qiangli , JIANG Yanhuang , LU Yutong . Ensemble model and algorithm with recalling and forgetting mechanism for data stream mining[J]. Journal of Software, 2015, 26 (10): 2567- 2580. | |
17 | WANG S K, DAI B R. A g-means update ensemble learning approach for the imbalanced data stream with concept drifts[C]//International Conference on Big Data Analytics and Knowledge Discovery. Berlin, Germany: Springer, 2016: 255-266. |
18 | SUN YU , TANG KE , ZHU ZEXUAN , et al. Concept drift adaptation by exploiting historical knowledge[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 1- 10. |
19 | ZHANG H, SHENG Shengli. Learning weighted naive bayes with accurate ranking[C]//Proceedings of the fourth International Conference on Data Mining. New Jersey, USA: IEEE, 2004: 567-570. |
20 | HALL M . A decision tree-based attribute weighting filter for naive Bayes[J]. Knowledge-Based Systems, 2007, 20 (2): 120- 126. |
21 |
JIANG Liangxiao , LI Chaoqun , WANG Shasha , et al. Deep feature weighting for naive bayes and its application to text classification[J]. Engineering Applications of Artificial Intelligence, 2016, 52, 26- 39.
doi: 10.1016/j.engappai.2016.02.002 |
22 | GROSSMAN D, DOMINGOS P. Learning bayesian network classifiers by maximizing conditional likelihood[C]//Proceedings of the twenty-first International Conference on Machine learning. New York, USA: ACM, 2004. |
23 |
ZHU Ciyou , BYRD R H , LU Peihuang , et al. Algorithm 778: l-bfgs-b: fortran subroutines for large-scale bound-constrained optimization[J]. ACM Transactions on Mathematical Software, 1997, 23 (4): 550- 560.
doi: 10.1145/279232.279236 |
24 |
SONG Ge , YE Yunming , ZHANG Haijun , et al. Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift[J]. Information Sciences, 2016, 357, 125- 143.
doi: 10.1016/j.ins.2016.03.043 |
25 |
PIETRUCZUK L , RUTKOWSKI L , JAWORSKI M , et al. How to adjust an ensemble size in stream data mining[J]. Information Sciences, 2017, 381, 46- 54.
doi: 10.1016/j.ins.2016.10.028 |
26 | BIFET A , HOLMES G , KIRKBY R , et al. Moa: massive online analysis[J]. Journal of Machine Learning Research, 2010, 11 (50): 1601- 1604. |
27 | OZA N C , RUSSELL S . Online ensemble learning[M]. Berkeley, USA: University of California, 2001. |
28 | KOLTER J Z , MALOOF M A . Dynamic weighted majority: an ensemble method for drifting concepts[J]. Journal of Machine Learning Research, 2007, (8): 2755- 2790. |
29 | HULTEN G, SPENCER L, DOMINGOS P. Mining time-changing data streams[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2001: 97-106. |
[1] | 张喜龙,韩萌,陈志强,武红鑫,李慕航. 动态集成选择的不平衡漂移数据流Boosting分类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 83-92. |
[2] | 刘子一,崔超然,孟凡安,林培光. 基于批归一化统计量的无源多领域自适应方法[J]. 山东大学学报 (工学版), 2023, 53(2): 102-108. |
[3] | 刘丁菠,刘学艳,于东然,杨博,李伟. 面向小样本目标检测任务的自适应特征重构算法[J]. 山东大学学报 (工学版), 2022, 52(6): 115-122. |
[4] | 武新章,梁祥宇,朱虹谕,张冬冬. 基于CEEMDAN-GRA-PCC-ATCN的短期风电功率预测[J]. 山东大学学报 (工学版), 2022, 52(6): 146-156. |
[5] | 许传臻,袭肖明,李维翠,孙仪,杨璐. 基于自适应多分辨率特征学习的CNV分型网络[J]. 山东大学学报 (工学版), 2022, 52(4): 69-75. |
[6] | 孟祥飞,张强,胡宴才,张燕,杨仁明. 欠驱动船舶自适应神经网络有限时间跟踪控制[J]. 山东大学学报 (工学版), 2022, 52(4): 214-226. |
[7] | 程业超,刘惊雷. 自适应图正则的单步子空间聚类[J]. 山东大学学报 (工学版), 2022, 52(2): 57-66. |
[8] | 闵海根,方煜坤,吴霞,王武祺. 网联交通环境下的车-车通信故障诊断方法[J]. 山东大学学报 (工学版), 2021, 51(6): 84-92. |
[9] | 杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21. |
[10] | 梁启星,李彬,李志,张慧,荣学文,范永. 基于模型预测控制的四足机器人斜坡自适应调整算法与实现[J]. 山东大学学报 (工学版), 2021, 51(3): 37-44. |
[11] | 周恺卿,李航程,莫礼平. 基于全局最优的自适应和声搜索算法[J]. 山东大学学报 (工学版), 2021, 51(2): 47-56. |
[12] | 程春蕊,毛北行. 一类非线性混沌系统的自适应滑模同步[J]. 山东大学学报 (工学版), 2020, 50(5): 1-6. |
[13] | 王春彦,邸金红,毛北行. 基于新型趋近律的参数未知分数阶Rucklidge系统的滑模同步[J]. 山东大学学报 (工学版), 2020, 50(4): 40-45. |
[14] | 刘保成,朴燕,宋雪梅. 联合检测的自适应融合目标跟踪[J]. 山东大学学报 (工学版), 2020, 50(3): 51-57. |
[15] | 闫威,张达敏,张绘娟,辛梓芸,陈忠云. 基于混合决策的改进鸟群算法[J]. 山东大学学报 (工学版), 2020, 50(2): 34-43. |
|