Journal of Shandong University(Engineering Science) ›› 2018, Vol. 48 ›› Issue (6): 82-88.doi: 10.6040/j.issn.1672-3961.0.2018.206

• Machine Learning & Data Mining • Previous Articles     Next Articles

An improved Apriori algorithm based on compression matrix

Fang LIU(),Guangchao WU*()   

  1. School of Mathematics, South China University of Technology, Guangzhou 510641, Guangdong, China
  • Received:2018-05-31 Online:2018-12-20 Published:2018-12-26
  • Contact: Guangchao WU E-mail:913259707@qq.com;magchwu@scut.edu.com
  • Supported by:
    国家自然科学基金面上项目(61370102)

Abstract:

An improved Apriori algorithm was proposed to solve the problem that the tradditional Apriori algorithm needed scan transaction database frequently and generate a large number of candidate item sets. This algorithm adopted the idea of matrix compression and added three vectors that were respectively used to represent the number of 1 in rows and columns in the transaction matrix, namely, number of transaction items and number of project support, and number of repeated transaction occurrence, so as to reduce the matrix size and avoid scanning the database multiple times. In the process of matrix operation, the number of transaction items and the number of project support in the matrix were sorted and the unsatisfied item sets and infrequent item sets were deleted to form a new matrix structure and improve the spatial efficiency of space. Performance analysis and experimental analysis of the improved algorithm showed that the algorithm was more efficient than Apriori algorithm and could mine frequent item sets more effectively.

Key words: association rules, Apriori algorithm, frequent itemsets, compression matrix, computational efficiency

CLC Number: 

  • TP391

Table 1

Transaction database"

事务 项目集
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3, I4
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3, I4, I5
T10 I2, I3

Fig.1

Running time of thyroid data set"

Fig.2

Number of candidate itemsets generated by thyroid"

Fig.3

Running time of mushroom data set"

Fig.4

Number of candidate itemsets generated by mushroom"

Fig.5

Running time of different transaction numbers"

1 ZHONG R, WANG H. Research of commonly used association rules mining algorithm in data mining[C]//International Conference on Internet Computing and Information Services. Hong Kong: IEEE Computer Society, 2011: 219-222.
2 AGRAWAL R, IMIELIENSKIN T, SWAMI A. Mining association rules between sets items in large databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington, USA: ACM Press, 1993: 207-216.
3 HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000: 1-12.
4 陈兴蜀, 张帅, 童浩, 等. 基于布尔矩阵和MapReduce的FP-Growth算法[J]. 华南理工大学学报(自然科学版), 2014, 42 (1): 135- 141.
doi: 10.3969/j.issn.1000-565X.2014.01.023
CHEN Xingshu , ZHANG Shuai , TONG Hao , et al. FP-growth algorithm based on boolean matrix and mapreduce[J]. Journal of South China University of Technology(Natural Science Edition), 2014, 42 (1): 135- 141.
doi: 10.3969/j.issn.1000-565X.2014.01.023
5 PARK J S, CHEN M S, YU P S. An effective hash-based algorithm for mining association rules[C]//Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, USA: ACM Press, 1995, 24(2): 175-186.
6 PARK J S , CHEN M S , YU P S . Using a hash-based method with transaction trimming for mining association rules[J]. IEEE Transactions on Knowledge and Data Engineering, 1997, 9 (5): 813- 825.
doi: 10.1109/69.634757
7 赵学健, 孙知信, 袁源, 等. 一种正交链表存储的改进Apriori算法[J]. 小型微型计算机系统, 2016, 37 (10): 2291- 2295.
doi: 10.3969/j.issn.1000-1220.2016.10.028
ZHAO Xuejian , SUN Zhixin , YUAN Yuan , et al. An improved Apriori algorithm based on orthogonal list storage[J]. Journal of Chinese Computer Systems, 2016, 37 (10): 2291- 2295.
doi: 10.3969/j.issn.1000-1220.2016.10.028
8 WANG Feng, LI Yonghua. An improved apriori algorithm based on the matrix[C]//Proceedings of 2008 International Seminar on Future BioMedical Information Engineering. Wuhan: IEEE Computer Society, 2008: 152-155.
9 LUO D , TAOSHEN L I , COMPUTER S O , et al. Research on improved apriori algorithm based on compressed matrix[J]. Computer Science, 2013, 40 (12): 75- 80.
10 任伟建, 于博文. 基于矩阵约简的Apriori算法改进[J]. 计算机与现代化, 2015, (9): 1- 5.
doi: 10.3969/j.issn.1006-2475.2015.09.001
REN Weijian , YU Bowen . Improved apriori algorithm based on matrix reduction[J]. Computer & Modernization, 2015, (9): 1- 5.
doi: 10.3969/j.issn.1006-2475.2015.09.001
11 杨秋翔, 孙涵. 基于权值向量矩阵约简的Apriori算法[J]. 计算机工程与设计, 2018, 39 (3): 690- 693.
YANG Qiuxiang , SUN Han . Apriori algorithm based on weight vector matrix reduction[J]. Computer Engineering & Design, 2018, 39 (3): 690- 693.
12 边根庆, 王月. 一种基于矩阵和权重改进的Apriori算法[J]. 微电子学与计算机, 2017, 34 (1): 136- 140.
BIAN Genqing , WANG Yue . An improved Apriori algorithm based matrix and weight[J]. Microelectronics & Computer, 2017, 34 (1): 136- 140.
13 张文东, 尹金焕, 贾晓飞, 等. 基于向量的频繁项集挖掘算法研究[J]. 山东大学学报(理学版), 2011, 46 (3): 31- 34.
ZHANG Wendong , YIN Jinhuan , JIA Xiaofei , et al. Research of a frequent itemsets mining algorithm based on vector[J]. Journal of Shandong University (Natural Science), 2011, 46 (3): 31- 34.
14 黄剑, 李明奇, 郭文强. 基于Hadoop的Apriori改进算法研究[J]. 计算机科学, 2017, 44 (7): 262- 266.
HUANG Jian , LI Mingqi , GUO Wenqiang . Research of Apriori improved algorithm based on hadoop[J]. Computer Science, 2017, 44 (7): 262- 266.
15 YU H , WEN J , WANG H , et al. An improved apriori algorithm based on the boolean matrix and hadoop[J]. Procedia Engineering, 2011, 15 (1): 1827- 1831.
16 吕桃霞, 刘培玉. 一种基于矩阵的强关联规则生成算法[J]. 计算机应用研究, 2011, 28 (4): 1301- 1303.
doi: 10.3969/j.issn.1001-3695.2011.04.029
LYU Taoxia , LIU Peiyu . Algorithm for generating strong association rules based on matrix[J]. Application Research of Computers, 2011, 28 (4): 1301- 1303.
doi: 10.3969/j.issn.1001-3695.2011.04.029
17 KHARE N, ADLAKHA N, PARDASANI K.R. An algorithm for mining multidimensional association rules using boolean matrix[C]//2010 International Conference on Recent Trends in Information, Telecommunication and Computing. Kerala, India: IEEE Computer Society, 2010: 95-99.
18 何月顺.关联规则挖掘技术的研究及应用[D].南京:南京航空航天大学, 2010.
HE Yueshun. Research and application of association rule mining technology[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2010.
19 LIU Huizhen, DAI Shangping, JIANG Hong. Quantitative association rules mining algorithm based on matrix[C]//International Conference on Computational Intelligence and Software Engineering. Wuhan: IEEE, 2009: 1-4.
[1] YAO Huachuan, WANG Lizhen, WU Pingping, ZOU Muquan. AC_SAR: actionable clustering algorithm based on strong association rule [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 38-46.
[2] YAN Xuan-hui, ZENG Qing-sheng*, SHU Cai-liang. A co-evolution model integrated with an immune mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(1): 34-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[2] SHI Lai-shun,WAN Zhong-yi . Synthesis and performance evaluation of a novel betaine-type asphalt emulsifier[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 112 -115 .
[3] LI Liang, LUO Qiming, CHEN Enhong. Graph-based ranking model for object-level search
[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 15 -21 .
[4] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[5] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[6] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .
[7] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 27 -32 .
[8] QIN Tong, SUN Fengrong*, WANG Limei, WANG Qinghao, LI Xincai. 3D surface reconstruction using the shape based interpolation guided by maximal discs[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 1 -5 .
[9] LIU Wen-liang, ZHU Wei-hong, CHEN Di, ZHANG Hong-quan. Detection and tracking of moving targets using the morphology match in radar images[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 31 -36 .
[10] WANG Li-ju,HUANG Qi-cheng,WANG Zhao-xu . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 51 -56 .