山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (2): 87-92.doi: 10.6040/j.issn.1672-3961.0.2022.341
侯延琛,赵金东*
HOU Yanchen, ZHAO Jindong*
摘要: 针对K-means聚类算法仅以质心作为聚类依据,在处理非圆球形数据集时效果不理想,数据集的形状特性未得到体现的问题,提出一种基于形状的K-means算法(shape K-means, SPK-means)。将判定点到不同簇中质心以及点到不同簇的最近边缘点的距离作为判定规则,使其具备对任意形状的数据集进行聚类的功能。设置两种不同的数据集进行聚类试验,结果表明,SPK-means聚类算法在处理非圆球形数据集时,其结果遵循原数据集的形状特征。
中图分类号:
[1] | LU Z. Research on the application of computer data mining technology in the era of big data[J]. Journal of Physics Conference Series, 2021, 1744(4): 042118. |
[2] | 韩子莹. 大数据技术应用的伦理探究[D]. 北京:北京邮电大学, 2019. HAN Ziying. Ethical exploration of the application of big data technology[D]. Beijing: Beijing University of Posts and Telecommunications, 2019. |
[3] | LI Y, WU H. A clustering method based on K-means algorithm[J]. Physics Procedia, 2012, 25: 1104-1109. |
[4] | 姬强, 孙艳丰, 胡永利, 等. 深度聚类算法研究综述[J]. 北京工业大学学报, 2021, 47(8): 912-924. JI Qiang, SUN Yanfeng, HU Yongli, et al. Review of clustering with deep learning[J]. Journal of Beijing University of Technology, 2021, 47(8): 912-924. |
[5] | 甘井中, 杨秀兰, 吕洁, 等. 人工智能中无监督学习算法综述[J]. 海峡科技与产业, 2019(1): 134-135. GAN Jingzhong, YANG Xiulan, LÜ Jie, et al. A review of unsupervised learning algorithms in artificial intelligence[J]. Straits Technology and Industry, 2019(1): 134-135. |
[6] | 任远航. 面向大数据的K-means算法综述[J]. 计算机应用研究, 2020, 37(12): 3528-3533. REN Yuanhang. Survey of K-means algorithm on big data[J]. Application Research of Computers, 2020, 37(12): 3528-3533. |
[7] | 董文静. K-means算法综述[J]. 信息与电脑, 2021, 33(11): 76-78. DONG Wenjing. Brief survey of K-means clustering algorithms[J]. Information and Computer, 2021, 33(11): 76-78. |
[8] | 李汉波, 魏福义, 张嘉龙, 等. 基于相异性邻域的改进K-means算法[J]. 现代信息科技, 2021, 5(7): 67-70. LI Hanbo, WEI Fuyi, ZHANG Jialong, et al. Improved K-means algorithm based on dissimilarity neighborhood[J]. Modern Information Technology, 2021, 5(7): 67-70. |
[9] | 崔丹丹. K-means聚类算法的研究与改进[D]. 合肥:安徽大学, 2012. CUI Dandan. Research and improvement of K-means clustering algorithm[D]. Hefei: Anhui University, 2012. |
[10] | 董秋仙, 朱赞生. 一种新的选取初始聚类中心的K-means算法[J]. 统计与决策, 2020, 36(16): 32-35. DONG Qiuxian, ZHU Zansheng. A new K-means algorithm for selecting initial clustering center[J]. Statistics & Decision, 2020, 36(16): 32-35. |
[11] | 郭永坤, 章新友, 刘莉萍, 等. 优化初始聚类中心的K-means聚类算法[J]. 计算机工程与应用, 2020, 56(15): 172-178. GUO Yongkun, ZHANG Xinyou, LIU Liping, et al. K-means clustering algorithm of optimizing initial clustering center[J]. Computer Engineering and Applications, 2020, 56(15): 172-178. |
[12] | HEGER J, ABDINE M. Using data mining techniques to investigate the correlation between surface cracks and flange lengths in deep drawn sheet metals[J]. IFAC-PapersOnLine, 2019, 52(13): 851-856. |
[13] | FAYYAD U M, REINA C, BRADLEY P S. Initialization of iterative refinement clustering algorithms[J]. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998, 8: 194-198. |
[14] | DUDA R O, HART P E. Pattern classification and scene analysis[M]. New York: John Wiley & Sons, 1973. |
[15] | MALKI N E, RAVAT F. K-means improvement by dynamic pre-aggregates[C] // Proceedings of the 21st International Conference on Enterprise Information Systems. Heraklion, Crete, Greece: ICEIS, 2019: 133-140. |
[16] | GENG X, MU Y, MAO S, et al. An improved K-means algorithm based on fuzzy metrics[J]. IEEE Access, 2020, 1(8): 217416-217424. |
[17] | ALGULIYEV R M, ALIGULIYEV R M, SUKHOSTAT L V, et al. Parallel batch K-means for big data clustering[J]. Computers & Industrial Engineering, 2021, 152: 107023. |
[18] | ANWARY A R, YU H N, VASSALLO M. Gait evaluation using procrustes and euclidean distance matrix analysis[J]. IEEE Journal of Biomedical and Health Informatics, 2019, 23: 2021-2029. |
[19] | ESTER M. A density-based algorithm for discovering clusters in large spatial databases with noise[J]. AAAI Press, 1996, 96: 226-231. |
[20] | JAIN A K, LAW M. Data clustering: a user's dilemma[J]. International Conference on Pattern Recognition & Machine Intelligence, 2005, 3766: 1-10. |
[21] | 王子龙, 李进, 宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94. WANG Zilong, LI Jin, SONG Yafei. Improved K-means algorithm based on distance and weight[J]. Computer Engineering and Applications, 2020,56(23): 87-94. |
[1] | 刘延俊,王伟,陈志,王冬海,王登帅,薛钢. 波浪能发电装置浮体形状参数对俘能性能影响[J]. 山东大学学报 (工学版), 2020, 50(6): 1-8,16. |
[2] | 董新宇,陈瀚阅,李家国,孟庆岩,邢世和,张黎明. 基于多方法融合的非监督彩色图像分割[J]. 山东大学学报 (工学版), 2019, 49(2): 96-101. |
[3] | 杨天鹏,徐鲲鹏,陈黎飞. 非均匀数据的变异系数聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 140-145. |
[4] | 肖苗苗,魏本征,尹义龙. 基于BFOA和K-means的复合入侵检测算法[J]. 山东大学学报(工学版), 2018, 48(3): 115-119. |
[5] | 张佩瑞,杨燕,邢焕来,喻琇瑛. 基于核K-means的增量多视图聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 48-53. |
[6] | 卢立倩,李增勇*,崔若飞,周伟伟. 血液酒精的近红外光谱法检测预放大电路设计[J]. 山东大学学报(工学版), 2014, 44(3): 64-68. |
[7] | 包书哲1,2,朱月澴1,王春立1*. 基于fMRI的图像底层特征关注研究[J]. 山东大学学报(工学版), 2014, 44(1): 24-28. |
[8] | 付延安1,刘海英1,孟庆虎1,2*. 基于明暗信息的胶囊内镜图像三维形状恢复[J]. 山东大学学报(工学版), 2012, 42(6): 63-68. |
[9] | 蒋盛益1,罗方伦1,余雯2. 基于视觉原理的密度聚类算法的改进[J]. 山东大学学报(工学版), 2011, 41(4): 85-90. |
[10] | 刘华勇,李璐,张大明. 带形状参数的四次Ball曲线[J]. 山东大学学报(工学版), 2011, 41(2): 23-28. |
[11] | 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1-5. |
[12] | 孙杰 李剑峰. 钛合金整体结构件加工关键技术研究[J]. 山东大学学报(工学版), 2009, 39(3): 81-88. |
[13] | 赵延风,王正中,张宽地 . 梯形明渠临界水深的直接计算方法[J]. 山东大学学报(工学版), 2007, 37(6): 101-105 . |
|