您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (6): 1-12.doi: 10.6040/j.issn.1672-3961.0.2024.269

• 机器学习与数据挖掘 •    

融合主客观评价的图数据Top-k频繁模式挖掘

黄芳,王欣*,高国海,沈玲珍,付勋,方宇   

  1. 西南石油大学计算机与软件学院, 四川 成都 610500
  • 发布日期:2025-12-22
  • 作者简介:黄芳(2000— ),女,四川眉山人,硕士研究生,主要研究方向为数据挖掘、机器学习. E-mail:huangfang1632021@163.com. *通信作者简介:王欣(1981— ),男,江苏扬州人,教授,博士生导师,博士,主要研究方向为机器学习、数据挖掘及油气人工智能. E-mail:xinwang@swpu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62172102);四川省科技创新人才基金资助项目(2022JDRC0009)

Mining Top-k frequent patterns for graphs based on subjective and objective metrics

HUANG Fang, WANG Xin*, GAO Guohai, SHEN Lingzhen, FU Xun, FANG Yu   

  1. HUANG Fang, WANG Xin*, GAO Guohai, SHEN Lingzhen, FU Xun, FANG Yu(School of Computer Science and Software Engineering, Southwest Petroleum University, Chengdu 610500, Sichuan, China
  • Published:2025-12-22

摘要: 为解决传统Top-k模式挖掘结果难以满足用户实际需求的问题,提出一种融合主客观评价的图数据Top-k频繁模式挖掘。通过基于最小DFS编码的模式表征技术,实现对模式的编码;搭建基于孪生神经网络的图模式评价模型(graph patterns evaluation model, GPEM),学习模式对之间的偏好关系,实现对模式的主观偏好预测;设计融合主客观的模式兴趣度评价函数,指导Top-k模式挖掘。在6个真实图数据集上的试验结果表明,GPEM在多项指标上优于其他模型,准确率最高可达93%。

关键词: 孪生神经网络, 频繁模式挖掘, 兴趣度评价函数

Abstract: In order to solve the problem that traditional Top-k pattern mining results failed to meet the users' practical needs, a graph data Top-k frequent pattern mining approach that integrates subjective and objective evaluations was proposed. A pattern representation technique based on minimum DFS coding was introduced to encode patterns. The graph patterns evaluation model(GPEM)was built based on a siamese neural network, which learned the biased order relationships between pattern pairs and predicted subjective preference of patterns. A pattern interestingness evaluation function that combined subjective and objective factors was designed to guide Top-k pattern mining. Experiments on six real graph datasets demonstrated that GPEM outperformed other models on various metrics, with up to 93% accuracy.

Key words: siamese neural network, frequent pattern mining, interestingness evaluation function

中图分类号: 

  • TP311
[1] INGALALLI V, IENCO D, PONCELET P. Mining frequent subgraphs in multigraphs[J]. Information Sciences, 2018, 451: 50-66.
[2] WANG X, LAN Z, HE Y A, et al. A cost-effective approach for mining near-optimal Top-k patterns[J]. Expert Systems with Applications, 2022, 202: 117262.
[3] PENG H, ZHANG D F. CFGM: an algorithm for closed frequent graph patterns mining[J]. Information Sciences, 2023, 625: 327-341.
[4] ZENG J, U L H, YAN X, et al. Fast core-based Top-k frequent pattern discovery in knowledge graphs[C] //2021 IEEE 37th International Conference on Data Engineering(ICDE). Chania, Greece: IEEE, 2021: 936-947.
[5] WANG X, XIANG M Y, ZHAN H Y, et al. Distributed Top-k pattern mining[M] // Cham: Springer International Publishing, 2021: 203-220.
[6] LE T, VO B, HUYNH V N, et al. Mining Top-k frequent patterns from uncertain databases[J]. Applied Intelligence, 2020, 50(5): 1487-1497.
[7] NATARAJAN D, RANU S. A scalable and generic framework to mine Top-k representative subgraph patterns[C] //2016 IEEE 16th International Conference on Data Mining(ICDM). Barcelona, Spain: IEEE, 2016: 370-379.
[8] SEMERTZIDIS K, PITOURA E. Top-k durable graph pattern queries on temporal graphs[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(1): 181-194.
[9] PRATEEK A, KHAN A, GOYAL A, et al. Mining Top-k pairs of correlated subgraphs in a large network[J]. Proceedings of the VLDB Endowment, 2020, 13(9): 1511-1524.
[10] 邹杰军, 王欣, 石俊豪, 等. 面向大图的Top-Rank-K频繁模式挖掘算法[J].南京大学学报(自然科学版), 2024, 60(1): 38-52. ZOU Jiejun, WANG Xin, SHI Junhao, et al. Top-Rank-K frequent pattern mining algorithm for large graphs[J]. Journal of Nanjing University(Natural Science), 2024, 60(1): 38-52.
[11] BELMECHERI N, ARIBI N, LAZAAR N, et al. Boosting the learning for ranking patterns[J]. Algorithms, 2023, 16(5): 218.
[12] DAVASHI R. ITUFP: a fast method for interactive mining of Top-k frequent patterns from uncertain data[J]. Expert Systems with Applications, 2023, 214: 119156.
[13] LEHEMBRE E, CREMILLEUX B, ZIMMERMANN A, et al. WaveLSea: helping experts interactively explore pattern mining search spaces[J]. Data Mining and Knowledge Discovery, 2024, 38(4): 2403-2439.
[14] WANG X, SHI J H, ZOU J J, et al. Supports estimation via graph sampling[J]. Expert Systems with App-lications, 2024, 240: 122554.
[15] FIRMANSYAH F, NURDIAWAN O. Penerapan data mining menggunakan algoritma frequent pattern-growth untuk menentukan pola pembelian produk chemicals[J]. Jurnal Mahasiswa Teknik Informatika, 2023, 7(1): 547-551.
[16] YAN X F, HAN J W. gSpan: graph-based substructure pattern mining[C] //2002 IEEE International Conference on Data Mining, 2002 Proceedings. Maebashi City, Japan: IEEE, 2002: 721-724.
[17] LI Y K, WU Z Y, LIN S, et al. Walking with perception: efficient random walk sampling via common neighbor awareness[C] //2019 IEEE 35th International Conference on Data Engineering(ICDE). Macao, China: IEEE, 2019: 962-973.
[18] YE S J, WANG Z, XIONG P B, et al. Multi-stage few-shot micro-defect detection of patterned OLED panel using defect inpainting and multi-scale Siamese neural network[J]. Journal of Intelligent Manufacturing, 2024, 35(6): 2653-2669.
[19] ROZEMBERCZKI B, ALLEN C, SARKAR R, et al. Multi-Scale attributed node embedding[J]. Journal of Complex Networks, 2021, 9(1): 1-22.
[20] YANG J, LESKOVEC J. Defining and evaluating network communities based on ground-truth[J]. Knowledge and Information Systems,2015,42(1):181-213.
[21] ELSEIDY M, ABDELHAMID E, SKIADOPOULOS S, et al. GraMi: frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7(7): 517-528.
[22] ABDELHAMID E, ABDELAZIZ I, KALNIS P, et al. ScaleMine: scalable parallel frequent subgraph mining in a single large graph[C] //SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Salt Lake City, USA: IEEE, 2016: 716-727.
[23] LESKOVEC J, MCAULEY J. Learning to discover social circles in ego networks[C] // Proceedings of the 26 International Conference on Neural Information Processing Systems. Nevade, USA: ACM, 2012: 539-547.
[24] KLIE J C, DE CASTILHO R E, GUREVYCH I. Analyzing dataset annotation quality management in the wild[J]. Computational Linguistics, 2024, 50(3): 817-866.
[25] SINGH R H, MAURYA S, TRIPATHI T, et al. Movie recommendation system using cosine similarity and KNN[J]. International Journal of Engineering and Advanced Technology, 2020, 9(5): 556-559.
[26] YI J S K, SEO M, PARK J, et al. PT4AL: using self-supervised pretext tasks for active learning[M] // Cham: Springer Nature Switzerland, 2022: 596-612.
[1] 侯延琛,赵金东. 任意形状聚类的SPK-means算法[J]. 山东大学学报 (工学版), 2023, 53(2): 87-92.
[2] 褚佳静,潘庆先,潘亚楠,刘庆菊. 基于信誉模型的众包质量控制算法[J]. 山东大学学报 (工学版), 2023, 53(2): 93-101.
[3] 刘斌,王磊,王冲,蔡香香. 对象集变化时相容块的近似集增量更新方法[J]. 山东大学学报 (工学版), 2023, 53(2): 109-117.
[4] 肖浩,廖祝华,刘毅志,刘思林,刘建勋. 实际环境中基于深度Q学习的无人车路径规划[J]. 山东大学学报 (工学版), 2021, 51(1): 100-107.
[5] 肖卓宇,何锫,陈果,徐运标,郭杰. 带特征指标约束描述的设计模式分类挖掘[J]. 山东大学学报 (工学版), 2020, 50(6): 48-58.
[6] 张文凯,禹可,吴晓非. 基于元图归一化相似性度量的实体推荐[J]. 山东大学学报 (工学版), 2020, 50(2): 66-75.
[7] 冯超,徐鲲鹏,陈黎飞. 符号序列的LDA主题特征表示方法[J]. 山东大学学报 (工学版), 2020, 50(2): 60-65.
[8] 陈德蕾, 王成, 陈建伟, 吴以茵. 基于门控循环单元与主动学习的协同过滤推荐算法[J]. 山东大学学报 (工学版), 2020, 50(1): 21-27.
[9] 邹启杰,李昊宇,张汝波,裴腾达,刘艳. 自主驾驶的人机交互控制[J]. 山东大学学报 (工学版), 2019, 49(2): 23-33.
[10] 张中伟,梅红岩,周军,贾慧萍. 基于多目标协同进化遗传算法的规则提取方法[J]. 山东大学学报 (工学版), 2019, 49(2): 122-130.
[11] 公冶小燕,林培光,任威隆. 基于Grefenstette编码和2-opt优化的遗传算法[J]. 山东大学学报 (工学版), 2018, 48(6): 19-26.
[12] 何东之, 张吉沣, 赵鹏飞. 不确定性传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版), 0, (): 22-28.
[13] 读习习,刘华锋,景丽萍. 一种融合社交网络的叠加联合聚类推荐模型[J]. 山东大学学报(工学版), 2018, 48(3): 96-102.
[14] 沈冀,马志强,李图雅,张力. 面向短文本情感分析的词扩充LDA模型[J]. 山东大学学报(工学版), 2018, 48(3): 120-126.
[15] 王换,周忠眉. 一种基于聚类的过抽样算法[J]. 山东大学学报(工学版), 2018, 48(3): 134-139.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!