基于紧凑效用列表的增量高效用模式挖掘方法

doi:10.6040/j.issn.1672-3961.0.2020.228

摘要/Abstract

摘要： 针对存在大量冗余数据等问题,提出紧凑增量高效用挖掘算法。采用HUI-trie结构和紧凑效用列表两种结构,前者用于更新高效用项集的效用,后者用于存储信息,而无需生成任何候选项。这两种结构使算法无需再次分析整个数据集,就可以将增加的数据反映到以前的分析结果中,更有效地处理增量数据集。试验结果表明,该算法在各种数据集上,运行时间平均提高38%,内存平均减少32%,具有一定的可扩展性。

关键词: 增量数据集, 高效用模式, 紧凑效用列表, 候选项集, 效用

Abstract: Aiming at the problem of large amounts of redundant data, a compact incremental high utility mining algorithm was proposed. The HUI-trie structure and a compact utility list were used. The former was used to update the utility of the high utility itemsets, and the latter was used to store information without generating any candidates. These two structures enabled the algorithm to reflect the increased data into the previous analysis results without reanalyzing the entire data set, and processed incremental data sets more effectively. The test results showed that the algorithm had an average increase of 38% in running time and an average reduction in memory of 32% on various data sets, and it had certain scalability.

Key words: incremental datasets, high utility pattern, compact utility list, candidate itemsets, utility

中图分类号:

TP301.6

张春砚,韩萌,孙蕊,杜诗语,申明尧. 基于紧凑效用列表的增量高效用模式挖掘方法[J]. 山东大学学报 (工学版), 2021, 51(2): 122-128.

ZHANG Chunyan, HAN Meng, SUN Rui, DU Shiyu, SHEN Mingyao. Incremental high utility pattern mining method based on compact utility list[J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 122-128.

参考文献

[1] KRISTIAN S, RUDOLF S. An approach to cluster separability in a partition[J]. Information Sciences, 2015, 305(1): 208-218.
[2] SÁ J, JARBAS J M, ANDRÉ R, et al. Shape class-ification using line segment statistics[J]. Information Sciences, 2015, 305(1): 349-356.
[3] LIU J, WANG K, FUNG B C M. Direct discovery of high utility itemsets without candidate generation[C] //Processdings of IEEE International Conference on Data Mining. Brussels, Belgium: IEEE, 2012, 1: 984-989.
[4] RYANG H, YUN U. High utility pattern mining over data streams with sliding window technique[J]. Expert Systems with Applications, 2016, 57(9): 214-231.
[5] FOURNIER-VIGER P, LIN J C, GUENICHE T, et al. Efficient incremental high utility itemset mining[C] //Processdings of Ase Bigdata & Socialinformatics. Kaohsiung, China: ACM, 2015, 53: 1-6.
[6] KRISHNAMOORTHY S. HMiner: efficiently mining high utility itemsets[J]. Expert Systems with Applications, 2017, 90(30): 168-183.
[7] FOURNIER-VIGER P, WU C W, ZIDA S, et al. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning[J]. Lecture Notes in Computer Science, 2014, 8502: 83-92.
[8] LIU Y, LIAO W K, CHOUDHARY A. A two-phase algorithm for fast discovery of high utility itemsets[C] //Processdings of Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining. Heidelberg, Germany: Springer-Verlag, 2005, 3518: 689-695.
[9] LIN C W, HONG T P, LU W H. An effective tree structure for mining high utility itemsets[J]. Expert Systems with Applications, 2011, 38(6): 7419-7424.
[10] HAN J, PEI J. Mining frequent patterns without can-didate generation[C] //Processdings of the ACM SIGMOD International Conference on Management of Data. Texas, USA: Sigmod Record, 2000, 29(2): 1-12.
[11] LIU M, QU J. Mining high utility itemsets without candidate generation[C] //Processdings of ACM Inter-national Conference on Information and Knowledge Management. Hawaii, USA: ACM, 2012: 55-64.
[12] ZIDA S, FOURNIER-VIGER P, LIN J C W, et al. EFIM: a highly efficient algorithm for high-utility itemset mining[C] //Processdings of Mexican Inter-national Conference on Artificial Intelligence. Cuernavaca, Mexico: Springer, 2015, 9413: 530-546.
[13] PENG A Y, KOH Y S, RIDDLE P. mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets[C] //Processdings of PacificAsia Conference on Knowledge Discovery and Data Mining. Jeju, Korea: Springer, 2017: 196-207.
[14] LIN C W, GAN W, FOURNIER-VIGER P, et al. Efficient mining of high-utility itemsets using multiple minimum utility thresholds[J]. Knowledge Based Systems, 2016, 113(1): 100-115.
[15] GAN W, LIN C W, FOURNIER-VIGER P, et al. HUOPM: high-utility occupancy pattern mining[J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1195-1208.
[16] GAN W, LIN C W, FOURNIER-VIGER P, et al. Mining high-utility itemsets with both positive and negative unit profits from uncertain databases[C] //Processdings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Jeju, Korea: Springer, 2017: 434-446.
[17] LIN C W, LAN G C, HONG T P. An incremental mining algorithm for high utility itemsets[J]. Expert Systems with Applications, 2012, 39(8): 7173-7180.
[18] LIN C W. Incrementally updating the discovered sequential patterns based on pre-large concept[J]. Intelligent Data Analysis, 2015, 19(5): 1071-1089.
[19] AHMED C F, TANBEER S K, JEONG B S, et al. Efficient tree structures for high utility pattern mining in incremental databases[J]. IEEE Transactions on Know-ledge and Data Engineering, 2009, 21(12): 1708-1721.
[20] YUN U, KIM D. Efficient algorithm for mining high average-utility itemsets in incremental transaction databases[J]. Applied Intelligence, 2017, 47(1): 1-18.
[21] YUN U, RYANG H, LEE G, et al. An efficient algorithm for mining high utility patterns from incremental databases with one database scan[J]. Knowledge Based Systems, 2017, 124(5): 188-206.
[22] LIN C W, GAN W, HONG T P, et al. Incrementally updating high-utility itemsets with transaction insertion[C] //Processdings of International Conference on Advanced Data Mining and Applications. Guilin, China: Springer, 2014, 8933: 44-56.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed