您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2018, Vol. 48 ›› Issue (6): 82-88.doi: 10.6040/j.issn.1672-3961.0.2018.206

• 机器学习与数据挖掘 • 上一篇    下一篇

一种基于压缩矩阵的改进Apriori算法

刘芳(),吴广潮*()   

  1. 华南理工大学数学学院, 广东 广州 510641
  • 收稿日期:2018-05-31 出版日期:2018-12-20 发布日期:2018-12-26
  • 通讯作者: 吴广潮 E-mail:913259707@qq.com;magchwu@scut.edu.com
  • 作者简介:刘芳(1996—),女,河南驻马店人,硕士研究生,主要研究方向为数据挖掘和机器学习. E-mail:913259707@qq.com
  • 基金资助:
    国家自然科学基金面上项目(61370102)

An improved Apriori algorithm based on compression matrix

Fang LIU(),Guangchao WU*()   

  1. School of Mathematics, South China University of Technology, Guangzhou 510641, Guangdong, China
  • Received:2018-05-31 Online:2018-12-20 Published:2018-12-26
  • Contact: Guangchao WU E-mail:913259707@qq.com;magchwu@scut.edu.com
  • Supported by:
    国家自然科学基金面上项目(61370102)

摘要:

针对Apriori算法需要频繁扫描事务数据库并且会产生大量候选项集的不足,提出一种改进的Apriori算法。采用矩阵压缩的思想,增加了3个向量,分别表示事务矩阵中各行各列1的个数,即事务项目数和项目支持数,以及重复的事务出现次数,从而减小矩阵规模,避免多次扫描数据库。在矩阵运算过程中,对矩阵中事务项目数和项目支持数进行排序并删除不满足条件的项集和非频繁项集,形成新的矩阵结构,提高空间效率。对改进后的算法进行性能分析和试验分析发现,该算法相对于Apriori算法具有更高的效率,同时可以更有效的挖掘出频繁项集。

关键词: 关联规则, Apriori算法, 频繁项集, 压缩矩阵, 计算效率

Abstract:

An improved Apriori algorithm was proposed to solve the problem that the tradditional Apriori algorithm needed scan transaction database frequently and generate a large number of candidate item sets. This algorithm adopted the idea of matrix compression and added three vectors that were respectively used to represent the number of 1 in rows and columns in the transaction matrix, namely, number of transaction items and number of project support, and number of repeated transaction occurrence, so as to reduce the matrix size and avoid scanning the database multiple times. In the process of matrix operation, the number of transaction items and the number of project support in the matrix were sorted and the unsatisfied item sets and infrequent item sets were deleted to form a new matrix structure and improve the spatial efficiency of space. Performance analysis and experimental analysis of the improved algorithm showed that the algorithm was more efficient than Apriori algorithm and could mine frequent item sets more effectively.

Key words: association rules, Apriori algorithm, frequent itemsets, compression matrix, computational efficiency

中图分类号: 

  • TP391

表1

事务数据库"

事务 项目集
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3, I4
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3, I4, I5
T10 I2, I3

图1

Thyroid数据集运行时间"

图2

Thyroid数据集生成候选项集个数"

图3

Mushroom数据集运行时间"

图4

Mushroom数据集生成候选项集个数"

图5

不同事务数的运行时间"

1 ZHONG R, WANG H. Research of commonly used association rules mining algorithm in data mining[C]//International Conference on Internet Computing and Information Services. Hong Kong: IEEE Computer Society, 2011: 219-222.
2 AGRAWAL R, IMIELIENSKIN T, SWAMI A. Mining association rules between sets items in large databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington, USA: ACM Press, 1993: 207-216.
3 HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000: 1-12.
4 陈兴蜀, 张帅, 童浩, 等. 基于布尔矩阵和MapReduce的FP-Growth算法[J]. 华南理工大学学报(自然科学版), 2014, 42 (1): 135- 141.
doi: 10.3969/j.issn.1000-565X.2014.01.023
CHEN Xingshu , ZHANG Shuai , TONG Hao , et al. FP-growth algorithm based on boolean matrix and mapreduce[J]. Journal of South China University of Technology(Natural Science Edition), 2014, 42 (1): 135- 141.
doi: 10.3969/j.issn.1000-565X.2014.01.023
5 PARK J S, CHEN M S, YU P S. An effective hash-based algorithm for mining association rules[C]//Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, USA: ACM Press, 1995, 24(2): 175-186.
6 PARK J S , CHEN M S , YU P S . Using a hash-based method with transaction trimming for mining association rules[J]. IEEE Transactions on Knowledge and Data Engineering, 1997, 9 (5): 813- 825.
doi: 10.1109/69.634757
7 赵学健, 孙知信, 袁源, 等. 一种正交链表存储的改进Apriori算法[J]. 小型微型计算机系统, 2016, 37 (10): 2291- 2295.
doi: 10.3969/j.issn.1000-1220.2016.10.028
ZHAO Xuejian , SUN Zhixin , YUAN Yuan , et al. An improved Apriori algorithm based on orthogonal list storage[J]. Journal of Chinese Computer Systems, 2016, 37 (10): 2291- 2295.
doi: 10.3969/j.issn.1000-1220.2016.10.028
8 WANG Feng, LI Yonghua. An improved apriori algorithm based on the matrix[C]//Proceedings of 2008 International Seminar on Future BioMedical Information Engineering. Wuhan: IEEE Computer Society, 2008: 152-155.
9 LUO D , TAOSHEN L I , COMPUTER S O , et al. Research on improved apriori algorithm based on compressed matrix[J]. Computer Science, 2013, 40 (12): 75- 80.
10 任伟建, 于博文. 基于矩阵约简的Apriori算法改进[J]. 计算机与现代化, 2015, (9): 1- 5.
doi: 10.3969/j.issn.1006-2475.2015.09.001
REN Weijian , YU Bowen . Improved apriori algorithm based on matrix reduction[J]. Computer & Modernization, 2015, (9): 1- 5.
doi: 10.3969/j.issn.1006-2475.2015.09.001
11 杨秋翔, 孙涵. 基于权值向量矩阵约简的Apriori算法[J]. 计算机工程与设计, 2018, 39 (3): 690- 693.
YANG Qiuxiang , SUN Han . Apriori algorithm based on weight vector matrix reduction[J]. Computer Engineering & Design, 2018, 39 (3): 690- 693.
12 边根庆, 王月. 一种基于矩阵和权重改进的Apriori算法[J]. 微电子学与计算机, 2017, 34 (1): 136- 140.
BIAN Genqing , WANG Yue . An improved Apriori algorithm based matrix and weight[J]. Microelectronics & Computer, 2017, 34 (1): 136- 140.
13 张文东, 尹金焕, 贾晓飞, 等. 基于向量的频繁项集挖掘算法研究[J]. 山东大学学报(理学版), 2011, 46 (3): 31- 34.
ZHANG Wendong , YIN Jinhuan , JIA Xiaofei , et al. Research of a frequent itemsets mining algorithm based on vector[J]. Journal of Shandong University (Natural Science), 2011, 46 (3): 31- 34.
14 黄剑, 李明奇, 郭文强. 基于Hadoop的Apriori改进算法研究[J]. 计算机科学, 2017, 44 (7): 262- 266.
HUANG Jian , LI Mingqi , GUO Wenqiang . Research of Apriori improved algorithm based on hadoop[J]. Computer Science, 2017, 44 (7): 262- 266.
15 YU H , WEN J , WANG H , et al. An improved apriori algorithm based on the boolean matrix and hadoop[J]. Procedia Engineering, 2011, 15 (1): 1827- 1831.
16 吕桃霞, 刘培玉. 一种基于矩阵的强关联规则生成算法[J]. 计算机应用研究, 2011, 28 (4): 1301- 1303.
doi: 10.3969/j.issn.1001-3695.2011.04.029
LYU Taoxia , LIU Peiyu . Algorithm for generating strong association rules based on matrix[J]. Application Research of Computers, 2011, 28 (4): 1301- 1303.
doi: 10.3969/j.issn.1001-3695.2011.04.029
17 KHARE N, ADLAKHA N, PARDASANI K.R. An algorithm for mining multidimensional association rules using boolean matrix[C]//2010 International Conference on Recent Trends in Information, Telecommunication and Computing. Kerala, India: IEEE Computer Society, 2010: 95-99.
18 何月顺.关联规则挖掘技术的研究及应用[D].南京:南京航空航天大学, 2010.
HE Yueshun. Research and application of association rule mining technology[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2010.
19 LIU Huizhen, DAI Shangping, JIANG Hong. Quantitative association rules mining algorithm based on matrix[C]//International Conference on Computational Intelligence and Software Engineering. Wuhan: IEEE, 2009: 1-4.
[1] 姚华传, 王丽珍, 吴萍萍, 邹目权. AC_SAR:基于强关联规则的可行动分簇算法[J]. 山东大学学报(工学版), 2014, 44(6): 38-46.
[2] 宋威,刘文博,李晋宏. 基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J]. 山东大学学报(工学版), 2011, 41(4): 49-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[3] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[4] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[5] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[6] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[7] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[8] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[9] 王,张艳宁,申家振,刘俊成 . 基于信息测度和支持向量机的图像边缘检测[J]. 山东大学学报(工学版), 2006, 36(3): 95 -99 .
[10] 孙玉利,李法德,左敦稳,戚美 . 直立分室式流体连续通电加热系统的升温特性[J]. 山东大学学报(工学版), 2006, 36(6): 19 -23 .