山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 88-96.doi: 10.6040/j.issn.1672-3961.0.2024.184
• 机器学习与数据挖掘 • 上一篇
周彦冰,马士伦,文益民*
ZHOU Yanbing, MA Shilun, WEN Yimin*
摘要: 为了解决传统的概念漂移检测方法,仅依赖错误率进行漂移检测不可靠的问题,提出一种基于图结构的概念漂移检测方法。该方法使用k关联最优图表示当前数据分布,定义样本的漂移率表示分类器与当前数据分布的不一致性,利用漂移率形成比特流,使用概念漂移检测器在比特流上检测概念漂移。通过与传统的使用错误率的概念漂移检测方法的对比和分析,结果表明在人工数据集上基分类器的准确率提高1%~5%,在真实数据集上提高1%~2%。所提出的方法有效提高概念漂移检测的准确性,帮助基分类器更好适应概念漂移。
中图分类号:
[1] GAMA J, MEDAS P, CASTILLO G, et al. Learning with drift detection[C] // Advances in Artificial Intelligence-SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence. Sao Luis, Brazil: Springer, 2004: 286-295. [2] FRIAS-BLANCO I, DEL CAMPO-ÁVILA J, RAMOS-JIMENEZ G, et al. Online and non-parametric drift detection methods based on Hoeffdings bounds[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823. [3] BERTINI J R, ZHAO L, MOTTA R, et al. A nonparametric classification method based on k-associated graphs[J]. Information Sciences, 2011, 181(24): 5435-5456. [4] BAYRAM F, AHMED B S, KASSLER A. From concept drift to model degradation: an overview on performance-aware drift detectors[J]. Knowledge-Based Systems, 2022, 245: 108632-108651. [5] PESARANGHADER A, VIKTOR H L. Fast hoeffding drift detection method for evolving data streams[C] //Machine Learning and Knowledge Discovery in Databases: European Conference. Riva del Garda, Italy: Springer, 2016: 96-111. [6] PESARANGHADER A, VIKTOR H, PAQUET E. Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams[J]. Machine Learning, 2018, 107(11): 1711-1743. [7] YAN M M W. Accurate detecting concept drift in evolving data streams[J]. ICT Express, 2020, 6(4): 332-338. [8] BAENA-GARCIA M, DEL CAMPO-ÁVILA J, FIDALGO R, et al. Early drift detection method[C] //Fourth international workshop on knowledge discovery from data streams. Berlin, Germany: ACM, 2006, 6: 77-86. [9] BIFET A, GAVALDA R. Learning from time-changing data with adaptive windowing[C] //Proceedings of the 2007 SIAM international conference on data mining. Minneapolis, USA: SIMA, 2007: 443-448. [10] NISHIDA K, YAMAUCHI K. Detecting concept drift using statistical testing[C] //International conference on discovery science. Berlin, Germany: Springer, 2007: 264-269. [11] DE LIMA CABRAL D R, DE BARROS R S M. Concept drift detection based on Fishers exact test[J]. Information Sciences, 2018, 442: 220-234. [12] FISHER R A. On the interpretation of χ2 from contingency tables, and the calculation of P[J]. Journal of the Royal Statistical Society, 1922, 85(1): 87-94. [13] DE BARROS R S M, HIDALGO J I G, DE LIMA CABRAL D R. Wilcoxon rank sum test drift detector[J]. Neurocomputing, 2018, 275: 1954-1963. [14] WILCOXON F. Individual comparisons by ranking methods[M]. New York: Springer, 1992: 196-202. [15] HIDALGO J I G, MARIÑO L M P, DE BARROS R S M. Cosine similarity drift detector[C] //International Conference on Artificial Neural Networks.Munich, Germany: Springer, 2019: 669-685. [16] MINKU L L, YAO X. DDD: A new ensemble approach for dealing with concept drift[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(4): 619-633. [17] SIDHU P, BHATIA M P S. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection[J]. International Journal of Machine Learning and Cybernetics, 2015, 6(6): 883-909. [18] MAHDI O A, PARDEDE E, ALI N. A hybrid block-based ensemble framework for the multi-class problem to react to different types of drifts[J]. Cluster Computing, 2021, 24(3): 2327-2340. [19] BERTINI J R, ZHAO L, LOPES A A. An incremental learning algorithm based on the K-associated graph for non-stationary data classification[J]. Information Sciences, 2013, 246: 52-68. [20] BERTINI J R, LOPES A A, ZHAO L. Partially labeled data stream classification with the semi-supervised K-associated graph[J]. Journal of the Brazilian Computer Society, 2012, 18: 299-310. [21] DA SILVA A T, BERTINI J R. Using the k-associated optimal graph to provide counterfactual explanations[C] //IEEE International Conference on Fuzzy Systems. Padua, Italy: IEEE, 2022: 1-8. |
[1] | 王梅,宋凯文,刘勇,王志宝,万达. DMKK-means——一种深度多核K-means聚类算法[J]. 山东大学学报 (工学版), 2024, 54(6): 1-7. |
[2] | 马坤,刘筱云,李乐平,纪科,陈贞翔,杨波. 用于意图识别的自适应多标签信息学习模型[J]. 山东大学学报 (工学版), 2024, 54(1): 45-51. |
[3] | 张喜龙,韩萌,陈志强,武红鑫,李慕航. 动态集成选择的不平衡漂移数据流Boosting分类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 83-92. |
[4] | 聂秀山,马玉玲,乔慧妍,郭杰,崔超然,于志云,刘兴波,尹义龙. 任务粒度视角下的学生成绩预测研究综述[J]. 山东大学学报 (工学版), 2022, 52(2): 1-14. |
[5] | 张妮,韩萌,王乐,李小娟,程浩东. 基于索引列表的增量高效用模式挖掘算法[J]. 山东大学学报 (工学版), 2022, 52(2): 107-117. |
[6] | 杨思, 李思童, 张进东, 白羽. 高速光通信激光器带宽模型改进与并行计算优化[J]. 山东大学学报 (工学版), 2019, 49(1): 17-22. |
[7] | 李尧, 王志海, 孙艳歌, 张伟. 一种基于深度属性加权的数据流自适应集成分类算法[J]. 山东大学学报 (工学版), 2018, 48(6): 44-55. |
[8] | 庞人铭,王波,叶昊,张海峰,李明亮. 基于PCA相似度和谱聚类相结合的高炉历史数据聚类[J]. 山东大学学报(工学版), 2017, 47(5): 143-149. |
[9] | 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23. |
[10] | 朱全银1,严云洋1,周培1,谷天峰2. 一种线性插补与自适应滑动窗口价格预测模型[J]. 山东大学学报(工学版), 2012, 42(5): 53-58. |
[11] | 郭躬德1,2,李南1,2,陈黎飞1,2. 一种适应概念漂移数据流的分类算法[J]. 山东大学学报(工学版), 2012, 42(4): 1-7. |
[12] | 王爱国,李廉*,杨静,陈桂林. 一种基于Bayesian网络的网页推荐算法[J]. 山东大学学报(工学版), 2011, 41(4): 137-142. |
[13] | 琚春华1,2,陈之奇1*. 一种挖掘概念漂移数据流的模糊积分集成分类方法[J]. 山东大学学报(工学版), 2011, 41(4): 44-48. |
[14] | 宋威,刘文博,李晋宏. 基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J]. 山东大学学报(工学版), 2011, 41(4): 49-55. |
[15] | 张新猛,蒋盛益. 一种基于相似度概率的不确定分类数据聚类算法[J]. 山东大学学报(工学版), 2011, 41(3): 12-16. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 5
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 21
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Cited |
|
|||||||||||||||||||||||||||||||||||||||||||||
Shared | ||||||||||||||||||||||||||||||||||||||||||||||
Discussed |
|