您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2019, Vol. 49 ›› Issue (2): 122-130.doi: 10.6040/j.issn.1672-3961.0.2018.211

• 机器学习与数据挖掘 • 上一篇    

基于多目标协同进化遗传算法的规则提取方法

张中伟(),梅红岩*(),周军,贾慧萍   

  1. 辽宁工业大学电子与信息工程学院, 辽宁 锦州 121000
  • 收稿日期:2018-05-25 出版日期:2019-04-20 发布日期:2019-04-19
  • 通讯作者: 梅红岩 E-mail:zzwzzwcool@126.com;liaoning_mhy@126.com
  • 作者简介:张中伟(1992—),男,山西朔州人,硕士研究生,主要研究方向为数据挖掘与知识发现.E-mail:zzwzzwcool@126.com
  • 基金资助:
    辽宁省科学技术计划项目面上项目(2015020089);辽宁省科学技术计划项目面上项目(201602372)

A rule extraction method based on multi-objective co-evolutionarygenetic algorithm

Zhongwei ZHANG(),Hongyan MEI*(),Jun ZHOU,Huiping JIA   

  1. School of Electronics & Information Engineering, Liaoning University of Technology, Jinzhou 121000, Liaoning, China
  • Received:2018-05-25 Online:2019-04-20 Published:2019-04-19
  • Contact: Hongyan MEI E-mail:zzwzzwcool@126.com;liaoning_mhy@126.com
  • Supported by:
    辽宁省科学技术计划项目面上项目(2015020089);辽宁省科学技术计划项目面上项目(201602372)

摘要:

针对事务数据库中连续型数值属性难以划分且规则提取效率较低的问题,提出一种交叉、变异种群协同进化的量化关联规则提取方法。利用帕累托原理的非支配排序对种群个体进行优化。利用个体相似度的基因型、表现型控制交叉种群中个体的配对,对变异种群采用水平集概念进行分割,并针对个体优劣分别采取单点突变和多点突变两种突变方式增强个体多样性。利用精英种群保存交叉种群与变异种群中的优秀个体并对其求取帕累托最优解集。在不同数据集上的仿真结果表明,该算法获得规则在性能和数量上达到较好的均衡,且能够有效覆盖数据集,验证了算法的有效性和可行性。

关键词: 多目标, 协同进化, 非支配, 相似度, 水平集

Abstract:

Aiming to deal with the problem that continuous numerical attributes in transaction database were difficult to divide and the efficiency of rule extraction was low, a multi-objective co-evolutionary quantification association rule extraction method was proposed with the cooperative of crossover population and mutation population. The non-dominated sorting of the Pareto principle was used to optimize the individuals of population. Genotype and phenotype of individuals similarity were used to control the matching of individuals in the crossover population. The mutation population was segmented by the concept of level set, then, single point mutation and multiple point mutation were adopted according to the quality of individuals to enhance the individuals diversity. The pareto optimal solution set was obtained from the elite population which was used to preserve the excellent individuals in the crossover population and the mutation population. The simulation results on different datasets showed that the algorithm achieved a good balance of performance and quantity, and the data set was effectively covered, which verifies the effectiveness and feasibility of the algorithm.

Key words: multi-objective, co-evolutionary, non-dominated, similarity, level set

中图分类号: 

  • TP311.13

表1

染色体编码"

A1 A2 Ai AN
T1 LB1 UB1 T2 LB2 UB2 Ti LBi UBi TN LBN UBN

图1

交叉种群进化模型"

图2

突变种群进化模型"

图3

算法模型"

表2

遗传算法进化参数"

种群大小 PCx PCn Pm 迭代次数
50 0.99 0.5 0.2 200

表3

数据集描述"

数据集 记录个数 属性个数 目标属性
Basketball 96 5 Points per minute
Bodyfat 252 18 Body height
Quake 2 178 4 Richter

表4

Basketball数据集提取的部分规则"

规则序号 规则前件 规则后件 置信度 可理解度 兴趣度
1 assists_per_minute∈(0.177 4, 0.337 8) height∈(159.011 7, 193.104 2)
points_per_minute∈(0.087 3, 0.773 9)
1 0.792 5 0.302 3
2 assists_per_minute∈(0.148 1, 0.291 6) height∈(154.136 8, 188.847 5)
time_played∈(5.578 0, 43.171 3)
age∈(22.613 6, 36.458 2)
points_per_minute∈(0.214 7, 0.691 7)
0.641 5 0.898 2 0.352 2
3 Height∈(169.988 1, 185.075 1) assists_per_minute∈(0.154 6, 0.359 5)
age∈(22.613 6, 36.458 2)
points_per_minute∈(0.214 7, 0.691 7)
0.931 0 0.861 4 0.354 3

图4

置信度均值随迭代次数变化情况"

图5

可理解度均值随迭代次数变化情况"

图6

兴趣度均值随迭代次数变化情况"

表5

提取规则支持度及规则大小"

数据集 支持度/% 规则大小
MODENAR RPSO GAR MOGAR REM_MOCGA MODENAR RPSO GAR MOGAR REM_MOCGA
Basketball 37.20 36.44 36.69 50.82 46.93 3.21 3.21 3.38 3.24 3.83
Bodyfat 65.22 65.22 65.26 57.22 41.17 6.87 7.06 7.45 6.96 7.90
Quake 39.86 38.74 38.65 30.12 47.69 2.03 2.22 2.33 2.38 3.12

表6

提取规则数目及规则置信度"

数据集 提取规则数量 置信度/%
MODENAR RPSO MOGAR REM_MOCGA MODENAR RPSO MOGAR REM_MOCGA
Basketball 48.0 34.2 50 152.3 61±2.1 60±2.8 83 88±2.9
Bodyfat 52.4 46.4 84 311.4 62±3.2 61±1.8 85 85±2.3
Quake 55.4 46.4 44.87 332.6 63±2.8 63±2.8 82 92±3.4

表7

提取规则覆盖度"

%
数据集 MODENAR RPSO GAR MOGAR REM_MOCGA
Basketball 100.00 100.00 100.00 100.00 100.00
Bodyfat 86.11 86.11 86.00 93.52 97.22
Quake 88.9 87.92 87.5 91.07 99.68
1 SRIKANT R , AGRAWAL R . Mining quantitative association rules in large relational tables[J]. ACM SIGMOD Record, 1996, 25 (2): 1- 12.
doi: 10.1145/235968
2 HAN Jiawei , PEI Jian , YIN Yiwen , et al. Mining frequent patterns without candidate generation: a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8 (1): 53- 87.
3 HU Yichung , CHEN Rueyshun , TZENG Gwohshiung . Discovering fuzzy association rules using fuzzy partition methods[J]. Knowledge-Based Systems, 2003, 16 (3): 137- 147.
doi: 10.1016/S0950-7051(02)00079-5
4 CAO Hui, SI Gangquan, ZHANG Yanbin, et al. A density-based quantitative attribute partition algorithm for association rule mining on industrial database[C]//2008 American Control Conference. Seattle, USA: IEEE, 2008: 75-80.
5 KAYAA M , ALHAJJB R . Genetic algorithm based framework for mining fuzzy aAssociation rules[J]. Fuzzy Sets & Systems, 2005, 152 (3): 587- 601.
6 周丽娟, 石倩, 葛学彬, 等. 基于聚类的模糊遗传挖掘算法的研究[J]. 计算机工程与应用, 2010, 46 (13): 118- 121.
doi: 10.3778/j.issn.1002-8331.2010.13.035
ZHOU Lijuan , SHI Qian , GE Xuebin , et al. Cluster-based evaluation in fuzzy-genetic data mining[J]. Computer Engineering & Applications, 2010, 46 (13): 118- 121.
doi: 10.3778/j.issn.1002-8331.2010.13.035
7 HONG Tzungpei , CHEN Chunhao , WU Yulung , et al. A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions[J]. Soft Computing, 2006, 10 (11): 1091- 1101.
doi: 10.1007/s00500-006-0046-x
8 ALCALÁ-FDEZ J , ALCALÁ R , GACTO M J , et al. Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms[J]. Fuzzy Sets & Systems, 2009, 160 (7): 905- 921.
9 PALACIOS A M , PALACIOS J , LU IS , et al. Genetic learning of the membership functions for mining fuzzy association rules from low quality data[J]. Information Sciences, 2015, 295 (C): 358- 378.
10 JESUS M J D , GÁMEZ J A , GONZÁLEZ P , et al. On the discovery of association rules by means of evolutionary algorithms[J]. Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery, 2011, 1 (5): 397- 415.
11 ASHISH Ghosh , BHABESH Nath . Multi-objective rule mining using genetic algorithms[J]. Information Sciences, 2004, 163 (1/2/3): 123- 133.
12 ALATAS B , AKIN E . Rough particle swarm optimization and its applications in data mining[J]. Soft Computing, 2008, 12 (12): 1205- 1218.
doi: 10.1007/s00500-008-0284-1
13 ALATAS B , AKIN E , KARCI A . MODENAR: Multi-objective differential evolution algorithm for mining numeric association rules[J]. Applied Soft Computing, 2008, 8 (1): 646- 656.
14 MINAEI-BIDGOLI B , BARMAKI R , NASIRI M . Mining numerical association rules via multi-objective genetic algorithms[J]. Information Sciences, 2013, 233 (2): 15- 24.
15 DEB K , PRATAP A , AGARWAL S , et al. A fast and elitist multiobjective genetic algorithm: NSGA-Ⅱ[J]. IEEE Transactions Evolutionary Computation, 2002, 6 (2): 182- 197.
16 吴琼, 曾庆鹏. 基于多目标烟花算法的关联规则挖掘[J]. 模式识别与人工智能, 2017, 30 (4): 365- 376.
WU Qiong , ZENG Qingpeng . Association rules mining based on multi-objective fireworks optimization algorithm[J]. Pattern Recognition and Artificial Intelligence, 2017, 30 (4): 365- 376.
17 SHANG Ronghua , WANG Yuying , WANG Jia , et al. A multi-population cooperative coevolutionary algorithm for multi-objective capacitated arc routing problem[J]. Information Sciences, 2014, 277 (2): 609- 642.
18 MEHTA S B , CHAUDHURY S , BHATTACHARYYA A , et al. Tissue classification in magnetic resonance images through the hybrid approach of michigan and pittsburg genetic algorithm[J]. Applied Soft Computing, 2011, 11 (4): 3476- 3484.
doi: 10.1016/j.asoc.2011.01.021
19 NASIRI M , TAGHAVI L S , MINAEE B . Numeric multi-objective rule mining using simulated annealing algorithm[J]. Journal of Convergence Information Technology, 2011, 5 (1): 60- 68.
20 MATA J, ALVAREZ J, RIQUELME J. Discovering numeric association rules via evolutionary algorithm[C]// Advances in Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 2002: 40-51.
[1] 黄劲潮. 基于快速区域建议网络的图像多目标分割算法[J]. 山东大学学报(工学版), 2018, 48(4): 20-26.
[2] 钱淑渠,武慧虹,徐国峰,金晶亮. 计及排放的动态经济调度免疫克隆演化算法[J]. 山东大学学报(工学版), 2018, 48(4): 1-9.
[3] 林江豪,周咏梅,阳爱民,陈锦. 基于词向量的领域情感词典构建[J]. 山东大学学报(工学版), 2018, 48(3): 40-47.
[4] 崔晓松,王颖,孟佳, 邹丽. 基于语言值相似度推理的网络商家自评价方法[J]. 山东大学学报(工学版), 2018, 48(1): 1-7.
[5] 王飞,徐健,李伟,汪新浩,施啸寒. 基于分布式储能系统的风储滚动优化调度方法[J]. 山东大学学报(工学版), 2017, 47(6): 89-94.
[6] 宋洋,钟麦英. 基于改进距离相似度的故障可分离性分析方法[J]. 山东大学学报(工学版), 2017, 47(5): 103-109.
[7] 庞人铭,王波,叶昊,张海峰,李明亮. 基于PCA相似度和谱聚类相结合的高炉历史数据聚类[J]. 山东大学学报(工学版), 2017, 47(5): 143-149.
[8] 李真伟,崔国忠,郭从洲,虞昌浩. 基于交替方向乘子法的图像盲复原[J]. 山东大学学报(工学版), 2017, 47(4): 14-18.
[9] 裴小兵,陈慧芬,张百栈,陈孟辉. 改善式BVEDA求解多目标调度问题[J]. 山东大学学报(工学版), 2017, 47(4): 25-30.
[10] 马帅依凡,赵子健. 基于人工标记的手术导航仪[J]. 山东大学学报(工学版), 2017, 47(3): 63-68.
[11] 邓冠龙,杨洪勇,张淑宁,顾幸生. 零等待flow shop多目标调度的混合差分进化算法[J]. 山东大学学报(工学版), 2016, 46(5): 21-28.
[12] 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15.
[13] 刘金慧. 基于多目标非线性函数某深基坑参数反演分析[J]. 山东大学学报(工学版), 2015, 45(4): 75-83.
[14] 董红斌, 张广江, 逄锦伟, 韩启龙. 一种基于协同进化方法的聚类集成算法[J]. 山东大学学报(工学版), 2015, 45(2): 1-9.
[15] 钱肃驰, 彭甫镕, 陆建峰. 基于语义相似度的标签优化[J]. 山东大学学报(工学版), 2015, 45(2): 37-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王素玉,艾兴,赵军,李作丽,刘增文 . 高速立铣3Cr2Mo模具钢切削力建模及预测[J]. 山东大学学报(工学版), 2006, 36(1): 1 -5 .
[2] 李梁,罗奇鸣,陈恩红. 对象级搜索中基于图的对象排序模型(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 15 -21 .
[3] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[4] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[5] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[6] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[7] 孙殿柱,朱昌志,李延瑞 . 散乱点云边界特征快速提取算法[J]. 山东大学学报(工学版), 2009, 39(1): 84 -86 .
[8] 李芳佳, 高尚策, 唐政, 石井雅博, 山下和也. 基于元胞自动化模型的三维雪花晶体近似模式的产生(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 102 -105 .
[9] 卜德云 张道强. 自适应谱聚类算法研究[J]. 山东大学学报(工学版), 2009, 39(5): 22 -26 .
[10] 李善评,赵玉晓,乔鹏,冯正志 . 好氧颗粒污泥的培养及基质降解和污泥生长动力学分析[J]. 山东大学学报(工学版), 2008, 38(3): 95 -98 .