山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (6): 1-12.doi: 10.6040/j.issn.1672-3961.0.2024.269
• 机器学习与数据挖掘 •
黄芳,王欣*,高国海,沈玲珍,付勋,方宇
HUANG Fang, WANG Xin*, GAO Guohai, SHEN Lingzhen, FU Xun, FANG Yu
摘要: 为解决传统Top-k模式挖掘结果难以满足用户实际需求的问题,提出一种融合主客观评价的图数据Top-k频繁模式挖掘。通过基于最小DFS编码的模式表征技术,实现对模式的编码;搭建基于孪生神经网络的图模式评价模型(graph patterns evaluation model, GPEM),学习模式对之间的偏好关系,实现对模式的主观偏好预测;设计融合主客观的模式兴趣度评价函数,指导Top-k模式挖掘。在6个真实图数据集上的试验结果表明,GPEM在多项指标上优于其他模型,准确率最高可达93%。
中图分类号:
| [1] INGALALLI V, IENCO D, PONCELET P. Mining frequent subgraphs in multigraphs[J]. Information Sciences, 2018, 451: 50-66. [2] WANG X, LAN Z, HE Y A, et al. A cost-effective approach for mining near-optimal Top-k patterns[J]. Expert Systems with Applications, 2022, 202: 117262. [3] PENG H, ZHANG D F. CFGM: an algorithm for closed frequent graph patterns mining[J]. Information Sciences, 2023, 625: 327-341. [4] ZENG J, U L H, YAN X, et al. Fast core-based Top-k frequent pattern discovery in knowledge graphs[C] //2021 IEEE 37th International Conference on Data Engineering(ICDE). Chania, Greece: IEEE, 2021: 936-947. [5] WANG X, XIANG M Y, ZHAN H Y, et al. Distributed Top-k pattern mining[M] // Cham: Springer International Publishing, 2021: 203-220. [6] LE T, VO B, HUYNH V N, et al. Mining Top-k frequent patterns from uncertain databases[J]. Applied Intelligence, 2020, 50(5): 1487-1497. [7] NATARAJAN D, RANU S. A scalable and generic framework to mine Top-k representative subgraph patterns[C] //2016 IEEE 16th International Conference on Data Mining(ICDM). Barcelona, Spain: IEEE, 2016: 370-379. [8] SEMERTZIDIS K, PITOURA E. Top-k durable graph pattern queries on temporal graphs[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(1): 181-194. [9] PRATEEK A, KHAN A, GOYAL A, et al. Mining Top-k pairs of correlated subgraphs in a large network[J]. Proceedings of the VLDB Endowment, 2020, 13(9): 1511-1524. [10] 邹杰军, 王欣, 石俊豪, 等. 面向大图的Top-Rank-K频繁模式挖掘算法[J].南京大学学报(自然科学版), 2024, 60(1): 38-52. ZOU Jiejun, WANG Xin, SHI Junhao, et al. Top-Rank-K frequent pattern mining algorithm for large graphs[J]. Journal of Nanjing University(Natural Science), 2024, 60(1): 38-52. [11] BELMECHERI N, ARIBI N, LAZAAR N, et al. Boosting the learning for ranking patterns[J]. Algorithms, 2023, 16(5): 218. [12] DAVASHI R. ITUFP: a fast method for interactive mining of Top-k frequent patterns from uncertain data[J]. Expert Systems with Applications, 2023, 214: 119156. [13] LEHEMBRE E, CREMILLEUX B, ZIMMERMANN A, et al. WaveLSea: helping experts interactively explore pattern mining search spaces[J]. Data Mining and Knowledge Discovery, 2024, 38(4): 2403-2439. [14] WANG X, SHI J H, ZOU J J, et al. Supports estimation via graph sampling[J]. Expert Systems with App-lications, 2024, 240: 122554. [15] FIRMANSYAH F, NURDIAWAN O. Penerapan data mining menggunakan algoritma frequent pattern-growth untuk menentukan pola pembelian produk chemicals[J]. Jurnal Mahasiswa Teknik Informatika, 2023, 7(1): 547-551. [16] YAN X F, HAN J W. gSpan: graph-based substructure pattern mining[C] //2002 IEEE International Conference on Data Mining, 2002 Proceedings. Maebashi City, Japan: IEEE, 2002: 721-724. [17] LI Y K, WU Z Y, LIN S, et al. Walking with perception: efficient random walk sampling via common neighbor awareness[C] //2019 IEEE 35th International Conference on Data Engineering(ICDE). Macao, China: IEEE, 2019: 962-973. [18] YE S J, WANG Z, XIONG P B, et al. Multi-stage few-shot micro-defect detection of patterned OLED panel using defect inpainting and multi-scale Siamese neural network[J]. Journal of Intelligent Manufacturing, 2024, 35(6): 2653-2669. [19] ROZEMBERCZKI B, ALLEN C, SARKAR R, et al. Multi-Scale attributed node embedding[J]. Journal of Complex Networks, 2021, 9(1): 1-22. [20] YANG J, LESKOVEC J. Defining and evaluating network communities based on ground-truth[J]. Knowledge and Information Systems,2015,42(1):181-213. [21] ELSEIDY M, ABDELHAMID E, SKIADOPOULOS S, et al. GraMi: frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7(7): 517-528. [22] ABDELHAMID E, ABDELAZIZ I, KALNIS P, et al. ScaleMine: scalable parallel frequent subgraph mining in a single large graph[C] //SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Salt Lake City, USA: IEEE, 2016: 716-727. [23] LESKOVEC J, MCAULEY J. Learning to discover social circles in ego networks[C] // Proceedings of the 26 International Conference on Neural Information Processing Systems. Nevade, USA: ACM, 2012: 539-547. [24] KLIE J C, DE CASTILHO R E, GUREVYCH I. Analyzing dataset annotation quality management in the wild[J]. Computational Linguistics, 2024, 50(3): 817-866. [25] SINGH R H, MAURYA S, TRIPATHI T, et al. Movie recommendation system using cosine similarity and KNN[J]. International Journal of Engineering and Advanced Technology, 2020, 9(5): 556-559. [26] YI J S K, SEO M, PARK J, et al. PT4AL: using self-supervised pretext tasks for active learning[M] // Cham: Springer Nature Switzerland, 2022: 596-612. |
| [1] | 侯延琛,赵金东. 任意形状聚类的SPK-means算法[J]. 山东大学学报 (工学版), 2023, 53(2): 87-92. |
| [2] | 褚佳静,潘庆先,潘亚楠,刘庆菊. 基于信誉模型的众包质量控制算法[J]. 山东大学学报 (工学版), 2023, 53(2): 93-101. |
| [3] | 刘斌,王磊,王冲,蔡香香. 对象集变化时相容块的近似集增量更新方法[J]. 山东大学学报 (工学版), 2023, 53(2): 109-117. |
| [4] | 肖浩,廖祝华,刘毅志,刘思林,刘建勋. 实际环境中基于深度Q学习的无人车路径规划[J]. 山东大学学报 (工学版), 2021, 51(1): 100-107. |
| [5] | 肖卓宇,何锫,陈果,徐运标,郭杰. 带特征指标约束描述的设计模式分类挖掘[J]. 山东大学学报 (工学版), 2020, 50(6): 48-58. |
| [6] | 张文凯,禹可,吴晓非. 基于元图归一化相似性度量的实体推荐[J]. 山东大学学报 (工学版), 2020, 50(2): 66-75. |
| [7] | 冯超,徐鲲鹏,陈黎飞. 符号序列的LDA主题特征表示方法[J]. 山东大学学报 (工学版), 2020, 50(2): 60-65. |
| [8] | 陈德蕾, 王成, 陈建伟, 吴以茵. 基于门控循环单元与主动学习的协同过滤推荐算法[J]. 山东大学学报 (工学版), 2020, 50(1): 21-27. |
| [9] | 邹启杰,李昊宇,张汝波,裴腾达,刘艳. 自主驾驶的人机交互控制[J]. 山东大学学报 (工学版), 2019, 49(2): 23-33. |
| [10] | 张中伟,梅红岩,周军,贾慧萍. 基于多目标协同进化遗传算法的规则提取方法[J]. 山东大学学报 (工学版), 2019, 49(2): 122-130. |
| [11] | 公冶小燕,林培光,任威隆. 基于Grefenstette编码和2-opt优化的遗传算法[J]. 山东大学学报 (工学版), 2018, 48(6): 19-26. |
| [12] | 何东之, 张吉沣, 赵鹏飞. 不确定性传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版), 0, (): 22-28. |
| [13] | 读习习,刘华锋,景丽萍. 一种融合社交网络的叠加联合聚类推荐模型[J]. 山东大学学报(工学版), 2018, 48(3): 96-102. |
| [14] | 沈冀,马志强,李图雅,张力. 面向短文本情感分析的词扩充LDA模型[J]. 山东大学学报(工学版), 2018, 48(3): 120-126. |
| [15] | 王换,周忠眉. 一种基于聚类的过抽样算法[J]. 山东大学学报(工学版), 2018, 48(3): 134-139. |
|
||