您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2022, Vol. 52 ›› Issue (2): 107-117.doi: 10.6040/j.issn.1672-3961.0.2021.290

• • 上一篇    

基于索引列表的增量高效用模式挖掘算法

张妮,韩萌*,王乐,李小娟,程浩东   

  1. 北方民族大学计算机科学与工程学院, 宁夏 银川 750021
  • 发布日期:2022-04-20
  • 作者简介:张妮(1996— ),女,山西长治人,硕士研究生,主要研究方向为数据挖掘、高效用模式挖掘. E-mail:516537910@qq.com. *通信作者简介:韩萌(1982— ),女,河南商丘人,副教授,硕士生导师,主要研究方向为数据挖掘. E-mail:2003051@nmu.edu.cn
  • 基金资助:
    国家自然科学基金项目(62062004);宁夏自然科学基金项目(2020AAC03216);北方民族大学研究生创新项目(YCX21082)

Incremental high utility pattern mining algorithm based on index list

ZHANG Ni, HAN Meng*, WANG Le, LI Xiaojuan, CHENG Haodong   

  1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, Ningxia, China
  • Published:2022-04-20

摘要: 基于效用列表的高效用模式挖掘算法主要局限性在于创建和维护效用列表非常耗时,原因是建立了大量的列表,且列表之间连接操作成本较高。为了解决这个问题,提出一种索引列表结构,可以依据索引值快速访问并更新存储在列表中的信息,并提出一种基于索引列表的增量高效用模式挖掘算法,在挖掘过程中加快挖掘速度并减少内存消耗。试验结果表明,所提出算法在增量式挖掘过程中能有效减少时空性能消耗,且索引列表结构表现出比普通列表更优异的性能。在多种数据集中,运行时间平均提高43%,内存平均减少20%,且在不同的数据插入率条件下具有稳定的性能。

关键词: 数据挖掘, 增量挖掘, 高效用模式, 索引列表, 效用

中图分类号: 

  • TP301.6
[1] LIAO J, WU S, LIU A. High utility itemsets mining based on divide-and-conquer strategy[J]. Wireless Personal Communications, 2021, 116(3): 1639-1657.
[2] LIN C W, HONG T P, LAN G C, et al. Efficient updating of discovered highutility itemsets for transaction deletion in dynamic databases[J]. Advanced Engineering Informatics, 2015, 29(1): 16-27.
[3] FOURNIER-VIGER P, WU C W, ZIDA S, et al. FHM: faster highutility itemset mining using estimated utility co-occurrence pruning[C] // Processdings of21st International Symposium on Methodologies for Intelligent Systems. Roskilde, Denmark:Lecture, 2014:83-92.
[4] DUONG Q H, FOURNIER-VIGER P, RAMAMPIARO H, et al. Efficient high utility itemset mining using buffered utility-lists[J]. Applied Intelligence, 2018, 48(7): 1859-1877.
[5] YUN U, RYANG H, LEE G, et al. An efficient algorithm for mining high utility patterns from incremental databases with one database scan[J]. Knowledge-Based Systems, 2018, 124(15): 188-206.
[6] LIU J, JU X, ZHANG X, et al. Incremental mining of high utility patterns in one phase by absence and legacy-based pruning[J]. IEEE Access, 2019: 74168-74180.
[7] LIU J, KE W, FUNG B. Mining high utility patterns in one phase without generating candidates[J]. IEEE Transactions on Knowledge & Data Engineering, 2016, 28(5): 1245-1257.
[8] KRISHNAMOORTHY S. Pruning strategies for mining high utility itemsets[J]. Expert Systems with Applications, 2015, 42(5): 2371-2381.
[9] WU M T, LIN C W, TAMRAKAR A. High utility itemset mining with effective pruning strategies[J]. ACM Transactions on Knowledge Discovery from Data, 2019, 13(6): 1-22.
[10] PENG A Y, YUN S K, RIDDLE P. MHUIMiner: afast high utility itemset mining algorithm for sparse datasets[C] //Processdings of the Pacific-asia Conference on Knowledge Discovery & Data Mining.Jeju, Korea: Springer, 2017: 196-207.
[11] ZIDA S, FOURNIER-VIGER P, LIN C W, et al. EFIM: a fast and memory efficient algorithm for highutility itemset mining[J]. Knowledge & Information Systems, 2017, 51(2): 1-31.
[12] LIN C W, PIROUZ M, DJENOURI Y, et al. Incrementally updating the high average-utility patterns with pre-large concept[J]. Applied Intelligence, 2020, 50(12): 3788-3807.
[13] AHMED C F, TANBEER S K, JEONG B S, et al.Efficient tree structures for high utility pattern mining in incremental databases[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708-1721.
[14] LIU M, QU J. Mining high utility itemsets without candidate generation[C] //Processdings of the ACM International Conference on Information and Knowledge Management. Hawaii, USA: ACM, 2012: 55-64.
[15] RYANG H, YUN U. Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques[J]. Knowledge and Information Systems, 2016, 51(2): 627-659.
[16] LEE J, YUN U, LEE G, et al. Efficient incremental high utility pattern mining based on pre-large concept[J]. Engineering Applications of Artificial Intelligence: the International Journal of Intelligent Real-Time Automation, 2018, 72: 111-123.
[17] HONG T P, WANG C Y, TAO Y H, et al. A new incremental data mining algorithm using pre-large itemsets[J]. Intelligent Data Analysis, 2001, 5(2): 111-129.
[18] LIN C W, HONG T P, LAN G C, et al. Incrementally mining high utility patterns based on pre-large concept[J]. Applied Intelligence, 2014, 40(2): 343-357.
[19] LIN C W, GAN W, HONG T P, et al. An incremental high-utility mining algorithm with transaction insertion[J]. The Scientific World Journal, 2015:161564.
[20] UY A, HN A, GL A, et al. Efficient approach for incremental high utility pattern mining with indexed list structure[J]. Future Generation Computer Systems, 2019, 95: 221-239.
[21] 张春砚,韩萌,孙蕊,等.基于紧凑效用列表的增量高效用模式挖掘方法[J]. 山东大学学报(工学版), 2021, 51(2): 122-128. ZHANG Chunyan, HAN Meng, SUN Rui, et al. Incremental high utility pattern mining method based on compact utility list[J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 122-128.
[22] DAWAR S. A hybrid framework for mining high-utility itemsets in a sparse transaction database[J]. Applied Intelligence, 2017, 47(3): 809-827.
[23] KRISHNAMOORTHY S. HMiner: efficiently mining high utility itemsets[J]. Expert Systems with Applications, 2017, 90: 168-183.
[24] FOURNIER-VIGER P, GOMARIZ A, GUENICHE T, et al. SPMF: a java open-source pattern mining library[J]. Journal of Machine Learning Research, 2014, 15(1):3389-3393.
[1] 聂秀山,马玉玲,乔慧妍,郭杰,崔超然,于志云,刘兴波,尹义龙. 任务粒度视角下的学生成绩预测研究综述[J]. 山东大学学报 (工学版), 2022, 52(2): 1-14.
[2] 张春砚,韩萌,孙蕊,杜诗语,申明尧. 基于紧凑效用列表的增量高效用模式挖掘方法[J]. 山东大学学报 (工学版), 2021, 51(2): 122-128.
[3] 杨思,李思童,张进东,白羽. 高速光通信激光器带宽模型改进与并行计算优化[J]. 山东大学学报 (工学版), 2019, 49(1): 17-22, 29.
[4] 庞人铭,王波,叶昊,张海峰,李明亮. 基于PCA相似度和谱聚类相结合的高炉历史数据聚类[J]. 山东大学学报(工学版), 2017, 47(5): 143-149.
[5] 熊文涛,冯育强. 基于决策人满意度的区间UTA方法[J]. 山东大学学报(工学版), 2016, 46(2): 72-77.
[6] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23.
[7] 朱全银1,严云洋1,周培1,谷天峰2. 一种线性插补与自适应滑动窗口价格预测模型[J]. 山东大学学报(工学版), 2012, 42(5): 53-58.
[8] 宋威,刘文博,李晋宏. 基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J]. 山东大学学报(工学版), 2011, 41(4): 49-55.
[9] 王爱国,李廉*,杨静,陈桂林. 一种基于Bayesian网络的网页推荐算法[J]. 山东大学学报(工学版), 2011, 41(4): 137-142.
[10] 琚春华1,2,陈之奇1*. 一种挖掘概念漂移数据流的模糊积分集成分类方法[J]. 山东大学学报(工学版), 2011, 41(4): 44-48.
[11] 张新猛,蒋盛益. 一种基于相似度概率的不确定分类数据聚类算法[J]. 山东大学学报(工学版), 2011, 41(3): 12-16.
[12] 孙静宇,余雪丽,陈俊杰, 李鲜花. 采样特异性因子及异常检测[J]. 山东大学学报(工学版), 2010, 40(5): 56-59.
[13] 董乃鹏 赵合计 SCHOMMER Christoph. 作者写作特征提取引擎[J]. 山东大学学报(工学版), 2009, 39(5): 27-31.
[14] 孙宇清,赵锐,姚青,史斌,刘佳 . 一种基于网格的障碍约束下空间聚类算法[J]. 山东大学学报(工学版), 2006, 36(3): 86-90 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!