您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (2): 122-128.doi: 10.6040/j.issn.1672-3961.0.2020.228

• • 上一篇    

基于紧凑效用列表的增量高效用模式挖掘方法

张春砚,韩萌*,孙蕊,杜诗语,申明尧   

  1. 北方民族大学计算机科学与工程学院, 宁夏 银川 750021
  • 发布日期:2021-04-16
  • 作者简介:张春砚(1995— ),女,河北张家口人,硕士研究生,主要研究方向为数据挖掘. E-mail:310300538@qq.com. *通信作者简介:韩萌(1982— ),女,河南商丘人,副教授,博士,硕士生导师,主要研究方向是数据挖掘. E-mail:2003051@nmu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(2062004);宁夏自然科学基金资助项目(2020AAC03216);北方民族大学研究生创新项目资助项目(YCX20061)

Incremental high utility pattern mining method based on compact utility list

ZHANG Chunyan, HAN Meng*, SUN Rui, DU Shiyu, SHEN Mingyao   

  1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, Ningxia, China
  • Published:2021-04-16

摘要: 针对存在大量冗余数据等问题,提出紧凑增量高效用挖掘算法。采用HUI-trie结构和紧凑效用列表两种结构,前者用于更新高效用项集的效用,后者用于存储信息,而无需生成任何候选项。这两种结构使算法无需再次分析整个数据集,就可以将增加的数据反映到以前的分析结果中,更有效地处理增量数据集。试验结果表明,该算法在各种数据集上,运行时间平均提高38%,内存平均减少32%,具有一定的可扩展性。

关键词: 增量数据集, 高效用模式, 紧凑效用列表, 候选项集, 效用

Abstract: Aiming at the problem of large amounts of redundant data, a compact incremental high utility mining algorithm was proposed. The HUI-trie structure and a compact utility list were used. The former was used to update the utility of the high utility itemsets, and the latter was used to store information without generating any candidates. These two structures enabled the algorithm to reflect the increased data into the previous analysis results without reanalyzing the entire data set, and processed incremental data sets more effectively. The test results showed that the algorithm had an average increase of 38% in running time and an average reduction in memory of 32% on various data sets, and it had certain scalability.

Key words: incremental datasets, high utility pattern, compact utility list, candidate itemsets, utility

中图分类号: 

  • TP301.6
[1] KRISTIAN S, RUDOLF S. An approach to cluster separability in a partition[J]. Information Sciences, 2015, 305(1): 208-218.
[2] SÁ J, JARBAS J M, ANDRÉ R, et al. Shape class-ification using line segment statistics[J]. Information Sciences, 2015, 305(1): 349-356.
[3] LIU J, WANG K, FUNG B C M. Direct discovery of high utility itemsets without candidate generation[C] //Processdings of IEEE International Conference on Data Mining. Brussels, Belgium: IEEE, 2012, 1: 984-989.
[4] RYANG H, YUN U. High utility pattern mining over data streams with sliding window technique[J]. Expert Systems with Applications, 2016, 57(9): 214-231.
[5] FOURNIER-VIGER P, LIN J C, GUENICHE T, et al. Efficient incremental high utility itemset mining[C] //Processdings of Ase Bigdata & Socialinformatics. Kaohsiung, China: ACM, 2015, 53: 1-6.
[6] KRISHNAMOORTHY S. HMiner: efficiently mining high utility itemsets[J]. Expert Systems with Applications, 2017, 90(30): 168-183.
[7] FOURNIER-VIGER P, WU C W, ZIDA S, et al. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning[J]. Lecture Notes in Computer Science, 2014, 8502: 83-92.
[8] LIU Y, LIAO W K, CHOUDHARY A. A two-phase algorithm for fast discovery of high utility itemsets[C] //Processdings of Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining. Heidelberg, Germany: Springer-Verlag, 2005, 3518: 689-695.
[9] LIN C W, HONG T P, LU W H. An effective tree structure for mining high utility itemsets[J]. Expert Systems with Applications, 2011, 38(6): 7419-7424.
[10] HAN J, PEI J. Mining frequent patterns without can-didate generation[C] //Processdings of the ACM SIGMOD International Conference on Management of Data. Texas, USA: Sigmod Record, 2000, 29(2): 1-12.
[11] LIU M, QU J. Mining high utility itemsets without candidate generation[C] //Processdings of ACM Inter-national Conference on Information and Knowledge Management. Hawaii, USA: ACM, 2012: 55-64.
[12] ZIDA S, FOURNIER-VIGER P, LIN J C W, et al. EFIM: a highly efficient algorithm for high-utility itemset mining[C] //Processdings of Mexican Inter-national Conference on Artificial Intelligence. Cuernavaca, Mexico: Springer, 2015, 9413: 530-546.
[13] PENG A Y, KOH Y S, RIDDLE P. mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets[C] //Processdings of PacificAsia Conference on Knowledge Discovery and Data Mining. Jeju, Korea: Springer, 2017: 196-207.
[14] LIN C W, GAN W, FOURNIER-VIGER P, et al. Efficient mining of high-utility itemsets using multiple minimum utility thresholds[J]. Knowledge Based Systems, 2016, 113(1): 100-115.
[15] GAN W, LIN C W, FOURNIER-VIGER P, et al. HUOPM: high-utility occupancy pattern mining[J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1195-1208.
[16] GAN W, LIN C W, FOURNIER-VIGER P, et al. Mining high-utility itemsets with both positive and negative unit profits from uncertain databases[C] //Processdings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Jeju, Korea: Springer, 2017: 434-446.
[17] LIN C W, LAN G C, HONG T P. An incremental mining algorithm for high utility itemsets[J]. Expert Systems with Applications, 2012, 39(8): 7173-7180.
[18] LIN C W. Incrementally updating the discovered sequential patterns based on pre-large concept[J]. Intelligent Data Analysis, 2015, 19(5): 1071-1089.
[19] AHMED C F, TANBEER S K, JEONG B S, et al. Efficient tree structures for high utility pattern mining in incremental databases[J]. IEEE Transactions on Know-ledge and Data Engineering, 2009, 21(12): 1708-1721.
[20] YUN U, KIM D. Efficient algorithm for mining high average-utility itemsets in incremental transaction databases[J]. Applied Intelligence, 2017, 47(1): 1-18.
[21] YUN U, RYANG H, LEE G, et al. An efficient algorithm for mining high utility patterns from incremental databases with one database scan[J]. Knowledge Based Systems, 2017, 124(5): 188-206.
[22] LIN C W, GAN W, HONG T P, et al. Incrementally updating high-utility itemsets with transaction insertion[C] //Processdings of International Conference on Advanced Data Mining and Applications. Guilin, China: Springer, 2014, 8933: 44-56.
[1] 张春砚,韩萌,孙蕊,杜诗语,申明尧. 基于分区列表的增量闭合高效用模式挖掘方法[J]. 山东大学学报 (工学版), 2022, 52(4): 118-130.
[2] 张妮,韩萌,王乐,李小娟,程浩东. 基于索引列表的增量高效用模式挖掘算法[J]. 山东大学学报 (工学版), 2022, 52(2): 107-117.
[3] 熊文涛,冯育强. 基于决策人满意度的区间UTA方法[J]. 山东大学学报(工学版), 2016, 46(2): 72-77.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王素玉,艾兴,赵军,李作丽,刘增文 . 高速立铣3Cr2Mo模具钢切削力建模及预测[J]. 山东大学学报(工学版), 2006, 36(1): 1 -5 .
[2] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[3] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[4] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[5] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[6] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[7] 浦剑1 ,张军平1 ,黄华2 . 超分辨率算法研究综述[J]. 山东大学学报(工学版), 2009, 39(1): 27 -32 .
[8] 王丽君,黄奇成,王兆旭 . 敏感性问题中的均方误差与模型比较[J]. 山东大学学报(工学版), 2006, 36(6): 51 -56 .
[9] 孙殿柱,朱昌志,李延瑞 . 散乱点云边界特征快速提取算法[J]. 山东大学学报(工学版), 2009, 39(1): 84 -86 .
[10] 杨发展1 ,艾兴1 ,赵军1 ,侯建锋2 . ZrO2含量对WC基复合材料的力学性能和微观结构的影响[J]. 山东大学学报(工学版), 2009, 39(1): 92 -95 .