山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (4): 74-82.doi: 10.6040/j.issn.1672-3961.0.2022.268
• 机器学习与数据挖掘 • 上一篇
李兆彬1,叶军1,2,周浩岩1,卢岚1,谢立1
LI Zhaobin1, YE Jun1,2, ZHOU Haoyan1, LU Lan1, XIE Li1
摘要: 粗糙K-means聚类算法存在如下不足:随机选取初始聚类中心会导致算法过早收敛容易陷入局部最优,质心更新公式中的权重和决定对象划分的阈值采用定值有时会导致聚类结果波动较大和精度下降。针对以上问题,引入一种变异策略和差分进化的萤火虫算法,从3个方面进行优化:构造新的目标函数,以目标函数值作为萤火虫光亮强度进行初始聚类中心点的搜索,把萤火虫算法求得的最优解作为算法的聚类中心进行聚类迭代;以下近似集和边界集中对象数量的变化以及对象分布的差异性动态调整质心权重;给出一种通过迭代次数自动获取阈值的方法。试验结果表明,改进后的算法减少了迭代次数,聚类结果稳定性好,准确率更高,改善了算法对随机初始中心点的敏感和稳定性不足等问题。
中图分类号:
[1] HATHAWAY R J, BEZDEK J C. Switching regression models and fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 1993, 1(3): 195-204. [2] KRISHNAPURAM R, KELLER J M. A possibilistic approach to clustering[J]. IEEE Transactions on Fuzzy Systems, 1993, 1(2): 98-110. [3] PAWLAK Z. Rough sets[J]. International Journal of Information and Computer Sciences, 1982, 11(5): 341-356. [4] LINGRAS P, WEST C. Interval set clustering of web users with rough K-means[J]. Journal of Intelligent Information Systems, 2004, 23(1): 5-16. [5] 谢娟英,张琰,谢维信,等.一种新的密度加权粗糙K-均值聚类算法[J].山东大学学报(理学版),2010,45(7): 1-6. XIE Juanying, ZHANG Yan, XIE Weixin, et al. A novel rough K-means clustering algorithm based on the weight of density[J]. Journal of Shangdong University(Natural Science), 2010, 45(7): 1-6. [6] 段文影,李向军,邱桃荣,等. 一种具有自适应参数的基于密度加权的粗糙K-均值算法[J]. 南昌大学学报(理科版), 2012,36(5): 498-501. DUAN Wenying, LI Xiangjun, QIU Taorong, et al. Rough K-means clustering algorithm with self-adaptive parameter and weighted-density[J]. Journal of Nanchang University(Natural Science), 2012, 36(5): 498-501. [7] KHAN H S, AHMAD A. Cluster center initialization algorithm for K-means clustering[J]. Pattern Recognition Letter, 2004, 25( 11): 1293-1302. [8] 马福民,孙静勇,张腾飞.考虑边界样本邻域归属信息的粗糙K-means增量聚类算法[J].控制与决策,2022,37(11):2968-2976. MA Fumin, SUN Jingyong, ZHANG Tengfei. A rough K-means incremental clustering algorithm considering neighborhood attribution information of boundary samples [J]. Control and Decision, 2022, 37(11):2968-2976. [9] 李艳,范斌,郭劼,等.基于K-原型聚类和粗糙集的属性约简方法[J].计算机科学,2021,48(增刊1):342-348. LI Yan, FAN Bin, GUO Jie, et al. Attribute reduction method based on K-prototype clustering and rough sets[J]. Computer Science, 2021, 48(Suppl.1):342-348. [10] 王子龙,李进,宋亚飞.基于距离和权重改进的K-means算法[J].计算机工程与应用,2020,56(23):87-94. WANG Zilong, LI Jin, SONG Yafei. Improved K-means algorithm based on distance and weight[J]. Computer Engineering and Applications,2020,56(23):87-94. [11] 洪亮亮,罗可.改进的基于遗传算法的粗糙聚类方法[J].计算机工程与应用,2010,46(25):142-145. HONG Liangliang, LUO Ke. Improved rough clustering method based on genetic algorithm[J]. Computer Engineering and Applications, 2010, 46(25):142-145. [12] 汤文亮,张平,汤树芳.基于精英反向学习的萤火虫K-means改进算法[J].计算机工程与设计, 2019, 40(11):3164-3169. TANG Wenliang, ZHANG Ping, TANG Shufang. An improved firefly K-means algorithm based on elite reverse learning[J]. Computer Engineering and Design,2019,40(11):3164-3169. [13] 刘洋,王慧琴,张小红.结合蚁群算法的改进粗糙K均值聚类算法[J].数据采集与处理,2019,34(2):341-348. LIU Yang, WANG Huiqin, ZHANG Xiaohong. An improved rough K means clustering algorithm combined with ant colony algorithm[J]. Journal of Data Acquisition and Processing, 2019, 34(2):341-348. [14] 叶廷宇,叶军,王晖,等.结合人工蜂群优化的粗糙K-means聚类算法[J].计算机科学与探索,2022,16(8):1923-1932. YE Tingyu, YE Jun, WANG Hui, et al. Rough K-means clustering algorithm combined with artificial bee colony optimization[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8):1923-1932. [15] KUMAR V, KUMAR D. A systematic review on firefly algorithm: past, present, and future[J]. Archives of Computational Methods in Engineering, 2021, 28(4): 3269-3291. [16] CUI Z, CHANG Y, ZHANG J, et al. Improved NSGA-III with selection-and-elimination operator[J]. Swarm and Evolutionary Computation, 2019, 49: 23-33. [17] PAN L, HE C, TIAN Y, et al. A classification-based surrogate-assisted evolutionary algorithm for expensive many-objective optimization[J]. IEEE Transactions on Evolutionary Computation, 2018, 23(1): 74-88. [18] CUI Z, CAO Y, CAI X, et al. Optimal LEACH protocol with modified bat algorithm for big data sensing systems in Internet of Things[J]. Journal of Parallel and Distributed Computing, 2019, 132: 217-229. [19] SENTHILNATH J, OMKAR S N, MANI V. Clustering using firefly algorithm: performance study[J]. Swarm and Evolutionary Computation, 2011, 1(3): 164-171. [20] 王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009,32(7):1229-1246. WANG Guoyin, YAO Yiyu, YU Hong. A review of rough set theory and application[J].Chinese Journal of Computers, 2009, 32(7):1229-1246. [21] PETERS G. Outliers in rough K-means clustering[C] //International Conference on Pattern Recognition and Machine Intelligence. Berlin, Germany:Springer, 2005: 702-707. [22] PETERS G. Some refinements of rough K-means clustering[J]. Pattern Recognition, 2006, 39(8): 1481-1491. [23] 马福民,逯瑞强,张腾飞.基于局部密度自适应度量的粗糙K-means聚类算法[J].计算机工程与科学, 2018,40(1):184-190. MA Fumin, LU Ruiqiang, ZHANG Tengfei. Rough K-means clustering algorithm based on adaptive measure of local density[J].Computer Engineering & Science, 2018, 40(1):184-190. [24] YANG X S. Firefly algorithms for multimodal optimization[C] / / International Symposium on Stochastic Algorithms. Berlin, Germany: Springer, 2009: 169-178. [25] HASSAN B A. CSCF: a chaotic sine cosine firefly algorithm for practical application problems[J]. Neural Computing and Applications, 2021, 33(12): 7011-7030. [26] 张大力,夏红伟,张朝兴,等.改进萤火虫算法及其收敛性分析[J].系统工程与电子技术,2022,44(4):1291-1300. ZHANG Dali, XIA Hongwei, ZHANG Chaoxing, et al. Improved firefly algorithm and its convergence analysis[J]. Systems Engineering and Electronics, 2022, 44(4):1291-1300. [27] 周涛.具有自适应参数的粗糙K-means聚类算法[J].计算机工程与应用,2010,46(26):7-10. ZHOU Tao. A rough K-means clustering algorithm with adaptive parameters[J]. Computer Engineering and Applications, 2010, 46(26):7-10. [28] 李莲,罗可,周博翔.基于粒计算的粗糙集聚类算法[J].计算机应用研究,2013,30(10):2916-2919. LI Lian, LUO Ke, ZHOU Boxiang. Rough clustering algorithm based on particle computing[J]. Application Research of Computers, 2013, 30(10):2916-2919. [29] PETERS G. Rough clustering utilizing the principle of indifference[J]. Information Sciences, 2014, 277: 358-374. [30] 周杨,苗夺谦,岳晓冬.基于自适应权重的粗糙K均值聚类算法[J].计算机科学,2011,38(6):237-241. ZHOU Yang, MIAO Duoqian, YUE Xiaodong. Rough K-means clustering algorithm based on adaptive weights[J]. Computer Science, 2011, 38(6):237-241. |
[1] | 黄华娟,程前,韦修喜,于楚楚. 融合Jaya高斯变异的自适应乌鸦搜索算法[J]. 山东大学学报 (工学版), 2023, 53(2): 11-22. |
[2] | 姚玉权,黄伯承,宋亮,张建,仰建岗,高杰. 多来源RAP下RHMA材料组成的动态控制策略[J]. 山东大学学报 (工学版), 2022, 52(6): 79-88. |
[3] | 李进,李二超. 基于正态分布和自适应变异算子的ε截断算法[J]. 山东大学学报 (工学版), 2019, 49(2): 47-53. |
[4] | 杨天鹏,徐鲲鹏,陈黎飞. 非均匀数据的变异系数聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 140-145. |
[5] | 江峰,杜军威,刘国柱,眭跃飞. 基于加权的K-modes聚类初始中心选择算法[J]. 山东大学学报(工学版), 2016, 46(2): 29-34. |
[6] | 景运革,李天瑞. 基于知识粒度的增量约简算法[J]. 山东大学学报(工学版), 2016, 46(1): 1-9. |
[7] | 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. |
[8] | 梁兴建, 詹志辉. 基于双模式变异策略的改进遗传算法[J]. 山东大学学报(工学版), 2014, 44(6): 1-7. |
[9] | 贺思艳1,李鹏2,刘澄玉2,吴学谦2,陈启军3. 互模糊熵中隶属函数的改进和影响分析[J]. 山东大学学报(工学版), 2014, 44(1): 63-68. |
[10] | 高峰1,迟春梅2. 决策表中属性的重排[J]. 山东大学学报(工学版), 2013, 43(5): 6-12. |
[11] | 樊伟. 一种多粒度粗糙区间模糊集方法[J]. 山东大学学报(工学版), 2013, 43(1): 63-68. |
[12] | 张潇丹,赵力,邹采荣*. 一种改进的混合蛙跳算法求解有约束优化问题[J]. 山东大学学报(工学版), 2013, 43(1): 1-8. |
[13] | 陈玉明,吴克寿,谢荣生. 基于相对知识粒度的决策表约简[J]. 山东大学学报(工学版), 2012, 42(6): 8-12. |
[14] | 杨习贝1,2,黄佳玲1,周君仪3,杨静宇2. 不完备系统中基于特征相容块的粗糙集[J]. 山东大学学报(工学版), 2012, 42(5): 1-6. |
[15] | 李慧1,2,胡云1,3,李存华1. 基于粗糙集理论的瓦斯灾害信息特征提取技术[J]. 山东大学学报(工学版), 2012, 42(5): 91-95. |
|