您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2018, Vol. 48 ›› Issue (6): 82-88.doi: 10.6040/j.issn.1672-3961.0.2018.206

• 机器学习与数据挖掘 • 上一篇    下一篇

一种基于压缩矩阵的改进Apriori算法

刘芳(),吴广潮*()   

  1. 华南理工大学数学学院, 广东 广州 510641
  • 收稿日期:2018-05-31 出版日期:2018-12-20 发布日期:2018-12-26
  • 通讯作者: 吴广潮 E-mail:913259707@qq.com;magchwu@scut.edu.com
  • 作者简介:刘芳(1996—),女,河南驻马店人,硕士研究生,主要研究方向为数据挖掘和机器学习. E-mail:913259707@qq.com
  • 基金资助:
    国家自然科学基金面上项目(61370102)

An improved Apriori algorithm based on compression matrix

Fang LIU(),Guangchao WU*()   

  1. School of Mathematics, South China University of Technology, Guangzhou 510641, Guangdong, China
  • Received:2018-05-31 Online:2018-12-20 Published:2018-12-26
  • Contact: Guangchao WU E-mail:913259707@qq.com;magchwu@scut.edu.com
  • Supported by:
    国家自然科学基金面上项目(61370102)

摘要:

针对Apriori算法需要频繁扫描事务数据库并且会产生大量候选项集的不足,提出一种改进的Apriori算法。采用矩阵压缩的思想,增加了3个向量,分别表示事务矩阵中各行各列1的个数,即事务项目数和项目支持数,以及重复的事务出现次数,从而减小矩阵规模,避免多次扫描数据库。在矩阵运算过程中,对矩阵中事务项目数和项目支持数进行排序并删除不满足条件的项集和非频繁项集,形成新的矩阵结构,提高空间效率。对改进后的算法进行性能分析和试验分析发现,该算法相对于Apriori算法具有更高的效率,同时可以更有效的挖掘出频繁项集。

关键词: 关联规则, Apriori算法, 频繁项集, 压缩矩阵, 计算效率

Abstract:

An improved Apriori algorithm was proposed to solve the problem that the tradditional Apriori algorithm needed scan transaction database frequently and generate a large number of candidate item sets. This algorithm adopted the idea of matrix compression and added three vectors that were respectively used to represent the number of 1 in rows and columns in the transaction matrix, namely, number of transaction items and number of project support, and number of repeated transaction occurrence, so as to reduce the matrix size and avoid scanning the database multiple times. In the process of matrix operation, the number of transaction items and the number of project support in the matrix were sorted and the unsatisfied item sets and infrequent item sets were deleted to form a new matrix structure and improve the spatial efficiency of space. Performance analysis and experimental analysis of the improved algorithm showed that the algorithm was more efficient than Apriori algorithm and could mine frequent item sets more effectively.

Key words: association rules, Apriori algorithm, frequent itemsets, compression matrix, computational efficiency

中图分类号: 

  • TP391

表1

事务数据库"

事务 项目集
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3, I4
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3, I4, I5
T10 I2, I3

图1

Thyroid数据集运行时间"

图2

Thyroid数据集生成候选项集个数"

图3

Mushroom数据集运行时间"

图4

Mushroom数据集生成候选项集个数"

图5

不同事务数的运行时间"

1 ZHONG R, WANG H. Research of commonly used association rules mining algorithm in data mining[C]//International Conference on Internet Computing and Information Services. Hong Kong: IEEE Computer Society, 2011: 219-222.
2 AGRAWAL R, IMIELIENSKIN T, SWAMI A. Mining association rules between sets items in large databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington, USA: ACM Press, 1993: 207-216.
3 HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000: 1-12.
4 陈兴蜀, 张帅, 童浩, 等. 基于布尔矩阵和MapReduce的FP-Growth算法[J]. 华南理工大学学报(自然科学版), 2014, 42 (1): 135- 141.
doi: 10.3969/j.issn.1000-565X.2014.01.023
CHEN Xingshu , ZHANG Shuai , TONG Hao , et al. FP-growth algorithm based on boolean matrix and mapreduce[J]. Journal of South China University of Technology(Natural Science Edition), 2014, 42 (1): 135- 141.
doi: 10.3969/j.issn.1000-565X.2014.01.023
5 PARK J S, CHEN M S, YU P S. An effective hash-based algorithm for mining association rules[C]//Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, USA: ACM Press, 1995, 24(2): 175-186.
6 PARK J S , CHEN M S , YU P S . Using a hash-based method with transaction trimming for mining association rules[J]. IEEE Transactions on Knowledge and Data Engineering, 1997, 9 (5): 813- 825.
doi: 10.1109/69.634757
7 赵学健, 孙知信, 袁源, 等. 一种正交链表存储的改进Apriori算法[J]. 小型微型计算机系统, 2016, 37 (10): 2291- 2295.
doi: 10.3969/j.issn.1000-1220.2016.10.028
ZHAO Xuejian , SUN Zhixin , YUAN Yuan , et al. An improved Apriori algorithm based on orthogonal list storage[J]. Journal of Chinese Computer Systems, 2016, 37 (10): 2291- 2295.
doi: 10.3969/j.issn.1000-1220.2016.10.028
8 WANG Feng, LI Yonghua. An improved apriori algorithm based on the matrix[C]//Proceedings of 2008 International Seminar on Future BioMedical Information Engineering. Wuhan: IEEE Computer Society, 2008: 152-155.
9 LUO D , TAOSHEN L I , COMPUTER S O , et al. Research on improved apriori algorithm based on compressed matrix[J]. Computer Science, 2013, 40 (12): 75- 80.
10 任伟建, 于博文. 基于矩阵约简的Apriori算法改进[J]. 计算机与现代化, 2015, (9): 1- 5.
doi: 10.3969/j.issn.1006-2475.2015.09.001
REN Weijian , YU Bowen . Improved apriori algorithm based on matrix reduction[J]. Computer & Modernization, 2015, (9): 1- 5.
doi: 10.3969/j.issn.1006-2475.2015.09.001
11 杨秋翔, 孙涵. 基于权值向量矩阵约简的Apriori算法[J]. 计算机工程与设计, 2018, 39 (3): 690- 693.
YANG Qiuxiang , SUN Han . Apriori algorithm based on weight vector matrix reduction[J]. Computer Engineering & Design, 2018, 39 (3): 690- 693.
12 边根庆, 王月. 一种基于矩阵和权重改进的Apriori算法[J]. 微电子学与计算机, 2017, 34 (1): 136- 140.
BIAN Genqing , WANG Yue . An improved Apriori algorithm based matrix and weight[J]. Microelectronics & Computer, 2017, 34 (1): 136- 140.
13 张文东, 尹金焕, 贾晓飞, 等. 基于向量的频繁项集挖掘算法研究[J]. 山东大学学报(理学版), 2011, 46 (3): 31- 34.
ZHANG Wendong , YIN Jinhuan , JIA Xiaofei , et al. Research of a frequent itemsets mining algorithm based on vector[J]. Journal of Shandong University (Natural Science), 2011, 46 (3): 31- 34.
14 黄剑, 李明奇, 郭文强. 基于Hadoop的Apriori改进算法研究[J]. 计算机科学, 2017, 44 (7): 262- 266.
HUANG Jian , LI Mingqi , GUO Wenqiang . Research of Apriori improved algorithm based on hadoop[J]. Computer Science, 2017, 44 (7): 262- 266.
15 YU H , WEN J , WANG H , et al. An improved apriori algorithm based on the boolean matrix and hadoop[J]. Procedia Engineering, 2011, 15 (1): 1827- 1831.
16 吕桃霞, 刘培玉. 一种基于矩阵的强关联规则生成算法[J]. 计算机应用研究, 2011, 28 (4): 1301- 1303.
doi: 10.3969/j.issn.1001-3695.2011.04.029
LYU Taoxia , LIU Peiyu . Algorithm for generating strong association rules based on matrix[J]. Application Research of Computers, 2011, 28 (4): 1301- 1303.
doi: 10.3969/j.issn.1001-3695.2011.04.029
17 KHARE N, ADLAKHA N, PARDASANI K.R. An algorithm for mining multidimensional association rules using boolean matrix[C]//2010 International Conference on Recent Trends in Information, Telecommunication and Computing. Kerala, India: IEEE Computer Society, 2010: 95-99.
18 何月顺.关联规则挖掘技术的研究及应用[D].南京:南京航空航天大学, 2010.
HE Yueshun. Research and application of association rule mining technology[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2010.
19 LIU Huizhen, DAI Shangping, JIANG Hong. Quantitative association rules mining algorithm based on matrix[C]//International Conference on Computational Intelligence and Software Engineering. Wuhan: IEEE, 2009: 1-4.
[1] 姚华传, 王丽珍, 吴萍萍, 邹目权. AC_SAR:基于强关联规则的可行动分簇算法[J]. 山东大学学报(工学版), 2014, 44(6): 38-46.
[2] 宋威,刘文博,李晋宏. 基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J]. 山东大学学报(工学版), 2011, 41(4): 49-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[3] 李梁,罗奇鸣,陈恩红. 对象级搜索中基于图的对象排序模型(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 15 -21 .
[4] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[5] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[6] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[7] 浦剑1 ,张军平1 ,黄华2 . 超分辨率算法研究综述[J]. 山东大学学报(工学版), 2009, 39(1): 27 -32 .
[8] 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1 -5 .
[9] 刘文亮,朱维红,陈涤,张泓泉. 基于雷达图像的运动目标形态检测及跟踪技术[J]. 山东大学学报(工学版), 2010, 40(3): 31 -36 .
[10] 王丽君,黄奇成,王兆旭 . 敏感性问题中的均方误差与模型比较[J]. 山东大学学报(工学版), 2006, 36(6): 51 -56 .