您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (4): 83-92.doi: 10.6040/j.issn.1672-3961.0.2022.126

• 机器学习与数据挖掘 • 上一篇    

动态集成选择的不平衡漂移数据流Boosting分类算法

张喜龙,韩萌*,陈志强,武红鑫,李慕航   

  1. 北方民族大学计算机科学与工程学院, 宁夏 银川 750021
  • 发布日期:2023-08-18
  • 作者简介:张喜龙(1996— ),男,宁夏银川人,硕士研究生,主要研究方向为数据流分类. E-mail:15168822238@163.com. *通信作者简介:韩萌(1982— ),女,河南商丘人,教授,博士,硕士生导师,主要研究方向为数据挖掘. E-mail:2003051@nmu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216;2022AAC03279);北方民族大学研究生创新项目(YCX21085)

Boosting classification algorithm for imbalanced drift data stream based on dynamic ensemble selection

ZHANG Xilong, HAN Meng*, CHEN Zhiqiang, WU Hongxin, LI Muhang   

  1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, Ningxia, China
  • Published:2023-08-18

摘要: 鉴于在数据流中无法一次性收集完整的训练集,同时数据可能会处于不平衡状态并夹杂概念漂移而影响分类性能,提出一种在线动态集成选择的不平衡漂移数据流Boosting分类算法。该算法采用多种平衡措施,使用泊松分布对数据流进行重采样,如果数据处于高度不平衡状态则采用存储少数类的窗口进行二次采样以达到当前数据平衡。为了提高算法的处理效率,提出分类器选择集成策略动态调整分类器数目,算法运行过程使用自适应窗口检测器检测概念漂移。试验结果表明,该算法在一定程度上提高了少数类的真阳性率和运行效率,可以对带有概念漂移的不平衡数据流有较好的分类性能。

关键词: 数据流, 不平衡分类, 概念漂移, Boosting, 窗口采样

中图分类号: 

  • TP301.6
[1] LI H,WANG Y, WANG H. Multi-window based on ensemble learning for classification of imbalanced streaming data[J]. World Wide Web, 2017, 20(6): 1507-1525.
[2] BERNARDO A, GOMES H M, MONTIEL J, et al. C-SMOTE: continuous synthetic minority oversampling for evolving data streams[C] //Proceedings of IEEE International Conference on Big Data. Atlanta, USA: IEEE, 2020: 483-492.
[3] CHAWL N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: improving prediction of the minority class in boosting[C] //Proceedings of Knowledge Discovery in Databases. Cavtat-Dubrovnik, Croatia: Springer, 2003: 107-109.
[4] DU L, ZHANG Y, GANG K, et al. Online ensemble learning algorithm for imbalanced data stream[J]. Apply Soft Computing, 2021, 107: 107378.
[5] SUN Y, KAMEL M S, WONG A K C, et al. Cost-sensitive boosting for classification of imbalanced data[J]. Pattern Recognition, 2007, 40(12): 3358-3378.
[6] TAO X, LI Q, GUO W, et al. Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification[J]. Information Sciences, 2019, 487:31-56.
[7] REN S Q, ZHU W, LI Z, et al. The gradual resampling ensemble for mining imbalanced data streams with concept drift[J]. Neurocomputing, 2018, 286: 150-166.
[8] REN S Q, ZHU W, LIAO B, et al. Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning[J]. Knowledge Based Systems, 2019, 163: 705-722.
[9] 杜诗语, 韩萌, 申明尧,等. 概念漂移数据流集成分类算法综述[J]. 计算机工程, 2020, 46(1): 15-24. DU Shiyu, HAN Meng, SHEN Mingyao, et al. Survey of ensemble classification algorithms for data streams with concept drift[J]. Computer Engineering, 2020, 46(1): 15-24.
[10] WADEWALE K, DESAI S. Survey on method of drift detection and classification for time varying data set[J]. International Research Journal of Engineering and Technology, 2015, 9(2): 709-713.
[11] BIFET A, GAVALDA R. Learning from time-changing data with adaptive windowing[C] //Proceedings of Seventh SIAM International Conference on Data Mining. Minneapolis, USA: SIAM, 2007: 443-448.
[12] OZA N C, RUSSELL S. Online bagging and boosting[C] //Proceedings of Artificial Intelligence and Statistics. Waikoloa, USA: IEEE, 2005: 105-112.
[13] WANG B Y, PINEAU J. Online bagging and boosting for imbalanced data stream[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3353-3366.
[14] WANG H, FAN W, YU P S. Mining concept-drifting data streams using ensemble classifiers[C] //Proceedings of Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, USA: ACM, 2003: 226-235.
[15] STREET W N, KIM Y. A streaming ensemble algorithm(sea)for large-scale classification[C] //Proceedings of seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2001: 377-382.
[16] HULTE G, SPENCER L, DOMINGS P. Mining time-changing data streams[C] //Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2001: 97-106.
[17] CANO A, KRAWCZYK B. Kappa updated ensemble for drifting data stream mining[J]. Machine Learning, 2020, 1(109): 178-218.
[18] ZYBLEWSKI P, SABOURIN R, WOZNIAK M. Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams[J]. Information Fusion, 2021, 66: 138-154.
[19] BERNARDO A, VALLE D E, BIFET A. Increment rebalancing learning on evolving data streams[C] //Proceedings of 20th International Conference on Data Mining Workshops. Sorrento, Italy: IEEE, 2020: 844-850.
[20] WANG S, MINKU L L, YAO X. Resampling-based ensemble methods for online class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1356-1368.
[21] 李艳霞, 柴毅, 胡友强, 等.不平衡数据分类方法综述[J]. 控制与决策, 2019, 34(4): 673-688. LI Yanxia, CHAI Yi, HU Youqiang, et al. Review of imbalanced data classification methods[J]. Control and Decision, 2019, 34(4): 673-688.
[22] BIFET A, HOLMES G, KIRKBY R,et al. Moa: massive online analysis[J]. Journal of Machine Learning Research, 2010: 1601-1604.
[23] LEMAIRE V, SALPERWYCK C, BONDU A. A survey on supervised classification on data streams[J]. Lecture Notes in Business Information Processing, 2015, 205: 88-125.
[1] 李尧,王志海,孙艳歌,张伟. 一种基于深度属性加权的数据流自适应集成分类算法[J]. 山东大学学报 (工学版), 2018, 48(6): 44-55, 66.
[2] 于立萍1,2,唐焕玲1,2. 基于分类一致性的迁移学习及其在行人检测中的应用[J]. 山东大学学报(工学版), 2013, 43(4): 26-31.
[3] 张伶卫,万文强. 基于云计算平台的代价敏感集成学习算法研究[J]. 山东大学学报(工学版), 2012, 42(4): 19-23.
[4] 郭躬德1,2,李南1,2,陈黎飞1,2. 一种适应概念漂移数据流的分类算法[J]. 山东大学学报(工学版), 2012, 42(4): 1-7.
[5] 琚春华1,2,陈之奇1*. 一种挖掘概念漂移数据流的模糊积分集成分类方法[J]. 山东大学学报(工学版), 2011, 41(4): 44-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!