您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 88-96.doi: 10.6040/j.issn.1672-3961.0.2024.184

• 机器学习与数据挖掘 • 上一篇    下一篇

基于图结构的概念漂移检测

周彦冰,马士伦,文益民*   

  1. 广西图像图形与智能处理重点实验室(桂林电子科技大学), 广西 桂林 541004
  • 发布日期:2025-04-15
  • 作者简介:周彦冰(2001— ),男,湖南益阳人,硕士研究生,主要研究方向为机器学习. E-mail:18074392274@163.com. *通信作者简介:文益民(1969— ),男,湖南桃江人,教授,博士生导师,博士,主要研究方向为机器学习、数据流分类、媒体分析与数据挖掘. E-mail: ymwen@guet.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62366011);广西重点研发计划资助项目(桂科AB21220023);广西图像图形与智能处理重点实验室资助项目(GIIP2306)

Concept drift detection based on graph structure

ZHOU Yanbing, MA Shilun, WEN Yimin*   

  1. Guangxi Key Laboratory of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China
  • Published:2025-04-15

摘要: 为了解决传统的概念漂移检测方法,仅依赖错误率进行漂移检测不可靠的问题,提出一种基于图结构的概念漂移检测方法。该方法使用k关联最优图表示当前数据分布,定义样本的漂移率表示分类器与当前数据分布的不一致性,利用漂移率形成比特流,使用概念漂移检测器在比特流上检测概念漂移。通过与传统的使用错误率的概念漂移检测方法的对比和分析,结果表明在人工数据集上基分类器的准确率提高1%~5%,在真实数据集上提高1%~2%。所提出的方法有效提高概念漂移检测的准确性,帮助基分类器更好适应概念漂移。

关键词: 数据挖掘, 数据流, 概念漂移, 图结构, k关联最优图

Abstract: In order to solve the problem that the traditional concept drift detection method only relied on the error rate for drift detection was not reliable enough, a concept drift detection method based on graph structure was proposed. In this method, the k-associated optimal graph was used to represent the current data distribution, and the drift rate of the sample was defined to represent the inconsistency between the classifier and the current data distribution. The drift rate was used to form a bit stream, and the concept drift detector was used to detect the concept drift on the bit stream. Compared with the traditional concept drift detection method using error rate, the results showed that the accuracy of the base classifier was improved by 1%-5% on artificial datasets and 1%-2% on real-world datasets. The proposed method could effectively improve the accuracy of concept drift detection and help base classifiers better adapt to concept drift.

Key words: data mining, data stream, concept drift, graph structure, k-associated optimal graph

中图分类号: 

  • TP181
[1] GAMA J, MEDAS P, CASTILLO G, et al. Learning with drift detection[C] // Advances in Artificial Intelligence-SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence. Sao Luis, Brazil: Springer, 2004: 286-295.
[2] FRIAS-BLANCO I, DEL CAMPO-ÁVILA J, RAMOS-JIMENEZ G, et al. Online and non-parametric drift detection methods based on Hoeffdings bounds[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823.
[3] BERTINI J R, ZHAO L, MOTTA R, et al. A nonparametric classification method based on k-associated graphs[J]. Information Sciences, 2011, 181(24): 5435-5456.
[4] BAYRAM F, AHMED B S, KASSLER A. From concept drift to model degradation: an overview on performance-aware drift detectors[J]. Knowledge-Based Systems, 2022, 245: 108632-108651.
[5] PESARANGHADER A, VIKTOR H L. Fast hoeffding drift detection method for evolving data streams[C] //Machine Learning and Knowledge Discovery in Databases: European Conference. Riva del Garda, Italy: Springer, 2016: 96-111.
[6] PESARANGHADER A, VIKTOR H, PAQUET E. Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams[J]. Machine Learning, 2018, 107(11): 1711-1743.
[7] YAN M M W. Accurate detecting concept drift in evolving data streams[J]. ICT Express, 2020, 6(4): 332-338.
[8] BAENA-GARCIA M, DEL CAMPO-ÁVILA J, FIDALGO R, et al. Early drift detection method[C] //Fourth international workshop on knowledge discovery from data streams. Berlin, Germany: ACM, 2006, 6: 77-86.
[9] BIFET A, GAVALDA R. Learning from time-changing data with adaptive windowing[C] //Proceedings of the 2007 SIAM international conference on data mining. Minneapolis, USA: SIMA, 2007: 443-448.
[10] NISHIDA K, YAMAUCHI K. Detecting concept drift using statistical testing[C] //International conference on discovery science. Berlin, Germany: Springer, 2007: 264-269.
[11] DE LIMA CABRAL D R, DE BARROS R S M. Concept drift detection based on Fishers exact test[J]. Information Sciences, 2018, 442: 220-234.
[12] FISHER R A. On the interpretation of χ2 from contingency tables, and the calculation of P[J]. Journal of the Royal Statistical Society, 1922, 85(1): 87-94.
[13] DE BARROS R S M, HIDALGO J I G, DE LIMA CABRAL D R. Wilcoxon rank sum test drift detector[J]. Neurocomputing, 2018, 275: 1954-1963.
[14] WILCOXON F. Individual comparisons by ranking methods[M]. New York: Springer, 1992: 196-202.
[15] HIDALGO J I G, MARIÑO L M P, DE BARROS R S M. Cosine similarity drift detector[C] //International Conference on Artificial Neural Networks.Munich, Germany: Springer, 2019: 669-685.
[16] MINKU L L, YAO X. DDD: A new ensemble approach for dealing with concept drift[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(4): 619-633.
[17] SIDHU P, BHATIA M P S. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection[J]. International Journal of Machine Learning and Cybernetics, 2015, 6(6): 883-909.
[18] MAHDI O A, PARDEDE E, ALI N. A hybrid block-based ensemble framework for the multi-class problem to react to different types of drifts[J]. Cluster Computing, 2021, 24(3): 2327-2340.
[19] BERTINI J R, ZHAO L, LOPES A A. An incremental learning algorithm based on the K-associated graph for non-stationary data classification[J]. Information Sciences, 2013, 246: 52-68.
[20] BERTINI J R, LOPES A A, ZHAO L. Partially labeled data stream classification with the semi-supervised K-associated graph[J]. Journal of the Brazilian Computer Society, 2012, 18: 299-310.
[21] DA SILVA A T, BERTINI J R. Using the k-associated optimal graph to provide counterfactual explanations[C] //IEEE International Conference on Fuzzy Systems. Padua, Italy: IEEE, 2022: 1-8.
[1] 王梅,宋凯文,刘勇,王志宝,万达. DMKK-means——一种深度多核K-means聚类算法[J]. 山东大学学报 (工学版), 2024, 54(6): 1-7.
[2] 马坤,刘筱云,李乐平,纪科,陈贞翔,杨波. 用于意图识别的自适应多标签信息学习模型[J]. 山东大学学报 (工学版), 2024, 54(1): 45-51.
[3] 张喜龙,韩萌,陈志强,武红鑫,李慕航. 动态集成选择的不平衡漂移数据流Boosting分类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 83-92.
[4] 聂秀山,马玉玲,乔慧妍,郭杰,崔超然,于志云,刘兴波,尹义龙. 任务粒度视角下的学生成绩预测研究综述[J]. 山东大学学报 (工学版), 2022, 52(2): 1-14.
[5] 张妮,韩萌,王乐,李小娟,程浩东. 基于索引列表的增量高效用模式挖掘算法[J]. 山东大学学报 (工学版), 2022, 52(2): 107-117.
[6] 杨思, 李思童, 张进东, 白羽. 高速光通信激光器带宽模型改进与并行计算优化[J]. 山东大学学报 (工学版), 2019, 49(1): 17-22.
[7] 李尧, 王志海, 孙艳歌, 张伟. 一种基于深度属性加权的数据流自适应集成分类算法[J]. 山东大学学报 (工学版), 2018, 48(6): 44-55.
[8] 庞人铭,王波,叶昊,张海峰,李明亮. 基于PCA相似度和谱聚类相结合的高炉历史数据聚类[J]. 山东大学学报(工学版), 2017, 47(5): 143-149.
[9] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23.
[10] 朱全银1,严云洋1,周培1,谷天峰2. 一种线性插补与自适应滑动窗口价格预测模型[J]. 山东大学学报(工学版), 2012, 42(5): 53-58.
[11] 郭躬德1,2,李南1,2,陈黎飞1,2. 一种适应概念漂移数据流的分类算法[J]. 山东大学学报(工学版), 2012, 42(4): 1-7.
[12] 王爱国,李廉*,杨静,陈桂林. 一种基于Bayesian网络的网页推荐算法[J]. 山东大学学报(工学版), 2011, 41(4): 137-142.
[13] 琚春华1,2,陈之奇1*. 一种挖掘概念漂移数据流的模糊积分集成分类方法[J]. 山东大学学报(工学版), 2011, 41(4): 44-48.
[14] 宋威,刘文博,李晋宏. 基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J]. 山东大学学报(工学版), 2011, 41(4): 49-55.
[15] 张新猛,蒋盛益. 一种基于相似度概率的不确定分类数据聚类算法[J]. 山东大学学报(工学版), 2011, 41(3): 12-16.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 田文,胡明华. 概率空域拥挤管理模型与方法[J]. 山东大学学报(工学版), 2010, 40(6): 41 -47 .
[2] 李明,刘玮,张彦铎. 基于改进合同网协议的多Agent动态任务分配[J]. 山东大学学报(工学版), 2016, 46(2): 51 -56 .
[3] 杨冬璐, 马逍天, 洪静兰. 基于生命周期评价的造纸废水的水足迹[J]. 山东大学学报 (工学版), 2019, 49(3): 114 -119 .
[4] 贾秀芹,刘允刚 . 非线性系统的 H部分状态观测器设计[J]. 山东大学学报(工学版), 2007, 37(5): 40 -46 .
[5] 杨国辉1,孙晓瑜1,2,椿范立1. 应用沸石胶囊催化剂制备生物汽油(英文)[J]. 山东大学学报(工学版), 2009, 39(2): 92 -97 .
[6] 吉兴全,韩国正,李可军,傅荣荣,朱仰贺. 基于密度的改进K均值聚类算法在配网区块划分中的应用[J]. 山东大学学报(工学版), 2016, 46(4): 41 -46 .
[7] 陈恩瑜,邓思文,陈方明,马池帅. 一种基于TBM掘进参数的现场岩石强度快速估算模型[J]. 山东大学学报(工学版), 2017, 47(2): 7 -13 .
[8] 薛翊国,李术才,张庆松,李树忱,苏茂鑫,刘钦 . 隧道信息化施工地质灾害预警预报技术研究[J]. 山东大学学报(工学版), 2008, 38(5): 25 -30 .
[9] 葛凯蓉, 常发亮, 董文会. 基于局部敏感直方图的稀疏表达跟踪算法[J]. 山东大学学报(工学版), 2014, 44(5): 14 -19 .
[10] 马其华 王宜泰. 高密度电阻率法在煤矿界外巨空水探测上的应用[J]. 山东大学学报(工学版), 2009, 39(4): 107 -111 .