您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 58-70.doi: 10.6040/j.issn.1672-3961.0.2024.045

• 机器学习与数据挖掘 • 上一篇    下一篇

融合局部截断距离及小簇合并的密度峰值聚类

陈素根1,2,赵志忠1*   

  1. 1.安庆师范大学数理学院, 安徽 安庆 246133;2.安徽省大别山区域复杂生态系统建模、仿真与控制重点实验室, 安徽 安庆 246133
  • 发布日期:2025-04-15
  • 作者简介:陈素根(1982—),男,安徽当涂人,教授,硕士生导师,博士,主要研究方向为模式识别、机器学习等. E-mail: chensugen@126.com. *通信作者简介:赵志忠(1999—),男,安徽淮南人,硕士研究生,主要研究方向为模式识别、机器学习. E-mail: zhaozz1999@126.com
  • 基金资助:
    国家自然科学基金青年基金资助项目(61702012);安徽省自然科学基金面上资助项目(2008085MF193);安徽省高等学校科学研究重点资助项目(2024AH051095)

Density peak clustering combining local truncation distance and small clusters merging

CHEN Sugen1,2, ZHAO Zhizhong1*   

  1. 1. School of Mathematics and Physics, Anqing Normal University, Anqing 246133, Anhui, China;
    2. Key Laboratory of Modeling, Simulation and Control of Complex Ecosystem in Dabie Mountains of Anhui Higher Education Institutes, Anqing 246133, Anhui, China
  • Published:2025-04-15

摘要: 针对密度峰值聚类算法定义的截断距离仅考虑样本全局分布,在样本分配时容易产生“多米诺骨牌”现象等问题,提出一种融合局部截断距离及小簇合并的密度峰值聚类算法。基于样本局部分布信息计算每个样本截断距离和局部密度,有利于准确获得复杂结构数据集上密度峰;根据样本决策值之间差值关系选择潜在密度峰并形成多个小簇;定义一种新的小簇间相似度,根据此相似度将小簇合并获得聚类结果,有效避免了“多米诺骨牌”现象。采用6个人工数据集和8个UCI数据集进行验证,所提算法在上述14个数据集上的标准化互信息、调整兰德系数和调整互信息平均值比5个对比算法平均提高18.15%、28.99%和20.22%,比原始密度峰值聚类算法提高30.06%,47.15%和31.90%,具有较好的聚类效果。

关键词: 聚类, 密度峰值聚类, 截断距离, 局部密度, 潜在密度峰

Abstract: Aiming at the problems that the truncation distance defined by the density peak clustering algorithm only considered the global distribution of samples and the "domino" phenomenon was easy to occur when assigning samples, a novel density peak clustering algorithm combining local truncation distance and small clusters merging was proposed. The truncation distance and local density of each sample were calculated based on the local distribution information of samples, which were conducive to accurately obtaining the density peaks on complex structure datasets. Potential density peaks were selected based on the difference between samples decision values and multiple small clusters were formed. A new kind of similarity between clusters was defined, and clusters were merged to obtain clustering results according to this similarity, which effectively avoided the "domino" phenomenon. Compared with several clustering algorithms on six synthetic datasets and eight UCI datasets, the standardized mutual information, adjusted rand index and adjusted mutual information average values of the proposed algorithm on 14 datasets were 18.15%, 28.99% and 20.22% higher than the five comparison algorithms on average, especially 30.06%, 47.15% and 31.90% higher than original density peak clustering algorithm. Experimental results showed the proposed algorithm had a good clustering effect.

Key words: clustering, density peak clustering, truncation distance, local density, potential density peaks

中图分类号: 

  • TP391
[1] JING Wenbo, JIN Tian, XIANG Deliang. Fast superpixel-based clustering algorithm for SAR image segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:1-5.
[2] 颜长春, 廖俊. 中国平台经济发展水平评价指标体系构建与测度[J]. 统计与决策, 2023, 39(11): 5-10. YAN Changchun, LIAO Jun. Platform of China economic development level evaluation index system construction and measure[J]. Journal of Statistics and Decision, 2023, 33(11): 5-10.
[3] ZHAO Yuan, FANG Zhaoyu, LIN Cuixiang, et al. RFCell: a gene selection approach for scRNA-seq clustering based on permutation and Random Forest[J]. Frontiers in Genetics, 2021, 12: 665843.
[4] GAO Tengfei, CHEN Dan, TANG Yunbo, et al. Adaptive density peaks clustering: towards exploratory EEG analysis[J]. Knowledge-Based Systems, 2022, 240: 108123.
[5] LI Chuanwei, CHEN Hongmei, LI Tianrui, et al. A stable community detection approach for complex network based on density peak clustering and label propagation[J]. Applied Intelligence, 2022, 52(2): 1188-1208.
[6] GUAN X, TERADA Y. Sparse kernel k-means for high-dimensional data[J]. Pattern Recognition, 2023, 144: 109873.
[7] WANG W, YANG J, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of 23rd International Conference on Very Large Data Bases. San Francisco, USA: ACM, 1997: 186-195.
[8] ANDREAS L, ERICH S. BETULA: fast clustering of large data with improved BIRCH CF-Trees[J]. Information Systems, 2022, 108: 101918.
[9] HUANG X G, MA T F, LIU C, et al. GriT-DBSCAN: a spatial clustering algorithm for very large databases[J]. Pattern Recognition, 2023, 142: 109658.
[10] FU Nan, NI Weiwei, HU Haibo, et al. Multidime-nsional grid-based clustering with local differential privacy[J]. Information Sciences, 2023, 623: 402-420.
[11] SUN Mingchen, YANG Mengduo, LI Yingji, et al. Structural-aware motif-based prompt tuning for graph clustering[J]. Information Sciences, 2023, 649: 119643.
[12] WU Chengmao, YU Dongxue. Generalized possibilistic c-means clustering with double weighting exponents[J]. Information Sciences, 2023, 645: 119283.
[13] RODRIGUZE A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
[14] WANG Y Z, QIAN J X, HASSAN M, et al. Density peak clustering algorithms: a review on the decade 2014—2023[J]. Expert Systems with Applications, 2024, 238: 121860.
[15] WEI Xiuxiu, PENG Maosong, HUANG Huajuan, et al. An overview on density peaks clustering[J]. Neurocomputing, 2023, 554: 126633.
[16] 陈叶旺, 申莲莲, 钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2): 378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394.
[17] DING Shifei, DU Wei, LI Chao, et al. Density peaks clustering algorithm based on improved similarity and allocation strategy[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(4): 1527-1542.
[18] ZHOU Zhou, SI Gangquan, SUN Haodong, et al. A robust clustering algorithm based on the identification of core points and KNN kernel density estimation[J]. Expert Systems with Applications, 2022, 195: 116573.
[19] YAN Huan, WANG Mingzhao, XIE Juanying. ANN-DPC: density peak clustering by finding the adaptive nearest neighbors[J]. Knowledge-Based Systems, 2024, 294: 111748.
[20] YU Donghua, LIU Guojun, GUO Maozu, et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7: 34301-34317.
[21] ZHAO J, WANG G, PAN J S, et al. Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets[J]. Pattern Recognition, 2023, 139: 109406.
[22] 赵嘉, 姚占峰, 吕莉, 等. 基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552. ZHAO Jia, YAO Zhanfeng, LÜ Li, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552.
[23] GUAN Junyi, LI Sheng, HE Xiongxiong, et al. Fast hierarchical clustering of local density peaks via an association degree transfer method[J]. Neuro-computing, 2021, 445: 401-418.
[24] GUO Wenjie, WANG Wenhai, ZHAO Shunping, et al. Density peak clustering with connectivity estimation[J]. Knowledge-Based Systems, 2022, 243: 108501.
[25] QIAO Kaikai, CHEN Jiawei, DUAN Shukai. Self-adaptive two-stage density clustering method with fuzzy connectivity[J]. Applied Soft Computing, 2024, 154: 111355.
[26] 赵嘉, 马清, 肖人彬,等. 面向流形数据的共享近邻密度峰值聚类算法[J]. 智能系统学报, 2023, 18(4): 719-730. ZHAO Jia, MA Qing, XIAO Renbin, et al. Shared neighbor density peak clustering algorithm for manifold data[J]. CAAI Transactions on Intelligent Systems, 2023, 18(4): 719-730.
[27] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Clustering with local density peaks-based minimum spanning tree[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(2): 374-387.
[28] QIU Teng, LI Yongjie. Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 4767-4780.
[29] MILAAN P. Clustering datasets[EB/OL].(2022-12-10)[2023-10-11]. https://github.com/milaan9/Clustering-Datasets
[30] BLAKE C L, MERZ C J. UCI repository of machine database[EB/OL].(2023-09-29)[2023-10-15]. https://archive.ics.uci.edu/datasets
[31] VINH N, EPPS J, BAILEY J. Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance[J]. Journal of Machine Learning Research, 2010, 11: 2837-2854.
[32] DONALD W Z, BRUNO D Z. Relative power of the Wilcoxon Test, the Friedman Test, and Repeated-Measures ANOVA on Ranks[J]. The Journal of Experimental Education, 1993, 62(1): 75-86.
[1] 王梅,宋凯文,刘勇,王志宝,万达. DMKK-means——一种深度多核K-means聚类算法[J]. 山东大学学报 (工学版), 2024, 54(6): 1-7.
[2] 王丽娟,徐晓,丁世飞. 面向密度峰值聚类的高效相似度度量[J]. 山东大学学报 (工学版), 2024, 54(3): 12-21.
[3] 张鑫,费可可. 基于log鲁棒核岭回归的子空间聚类算法[J]. 山东大学学报 (工学版), 2023, 53(6): 26-34.
[4] 李兆彬,叶军,周浩岩,卢岚,谢立. 变异萤火虫优化的粗糙K-均值聚类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 74-82.
[5] 侯延琛,赵金东. 任意形状聚类的SPK-means算法[J]. 山东大学学报 (工学版), 2023, 53(2): 87-92.
[6] 卢建云,张蔚,李林. 一种基于动态局部密度和聚类结构的聚类算法[J]. 山东大学学报 (工学版), 2022, 52(2): 118-127.
[7] 程业超,刘惊雷. 自适应图正则的单步子空间聚类[J]. 山东大学学报 (工学版), 2022, 52(2): 57-66.
[8] 孟银凤,杨佳宇,曹付元. 函数型数据的分裂转移式层次聚类算法[J]. 山东大学学报 (工学版), 2022, 52(1): 19-27.
[9] 朱恒东, 马盈仓, 代雪珍. 自适应半监督邻域聚类算法[J]. 山东大学学报 (工学版), 2021, 51(4): 24-34.
[10] 解子奇,王立宏,李嫚. 块对角子空间聚类中成对约束的主动式学习[J]. 山东大学学报 (工学版), 2021, 51(2): 65-73.
[11] 朱昌明,岳闻,王盼红,沈震宇,周日贵. 主动三支聚类下的全局和局部多视角多标签学习算法[J]. 山东大学学报 (工学版), 2021, 51(2): 34-46.
[12] 李蓓,赵松,谢志佳,牛萌. 电动汽车虚拟储能可用容量建模[J]. 山东大学学报 (工学版), 2020, 50(6): 101-111.
[13] 秦军,张远鹏,蒋亦樟,杭文龙. 多代表点自约束的模糊迁移聚类[J]. 山东大学学报 (工学版), 2019, 49(2): 107-115.
[14] 董新宇,陈瀚阅,李家国,孟庆岩,邢世和,张黎明. 基于多方法融合的非监督彩色图像分割[J]. 山东大学学报 (工学版), 2019, 49(2): 96-101.
[15] 朱映雪,黄瑞章,马灿. 一种具有新主题偏向性的短文本动态聚类方法[J]. 山东大学学报 (工学版), 2018, 48(6): 8-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 邓斌,王江 . 基于混沌同步与自适应控制的神经元模型参数估计[J]. 山东大学学报(工学版), 2007, 37(5): 19 -23 .
[2] 王进野,姚瑞英,张纪良,王其军 . 一类模糊双曲正切模型稳定性控制[J]. 山东大学学报(工学版), 2007, 37(2): 63 -66 .
[3] 夏 斌,张连俊 . DS-CDMA UWB系统中基于能量比较的TOA估计算法[J]. 山东大学学报(工学版), 2007, 37(1): 70 -73 .
[4] 员冬玲,邓建新,丁泽良,段振兴 . 梯度陶瓷水煤浆喷嘴的残余热应力有限元分析[J]. 山东大学学报(工学版), 2008, 38(2): 18 -22 .
[5] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[6] 高厚磊 田佳 杜强 武志刚 刘淑敏. 能源开发新技术——分布式发电[J]. 山东大学学报(工学版), 2009, 39(5): 106 -110 .
[7] 杨永顺1,王林2,高雪池1,贾海庆3,韦金城2 . 永久路面结构应变分布及疲劳损伤分析[J]. 山东大学学报(工学版), 2009, 39(2): 118 -124 .
[8] 王汝贵,蔡敢为 . 两自由度可控平面连杆机构机电耦合系统的超谐波共振分析[J]. 山东大学学报(工学版), 2008, 38(3): 58 -63 .
[9] 刘飞宏,王建明*,余丰,张刚. 基于SPH耦合有限元法的喷丸残余应力场数值模拟[J]. 山东大学学报(工学版), 2010, 40(6): 67 -71 .
[10] 刘晓平 王洪运 张鹏 秦绪平 张孟力. 三元共聚阳离子聚丙烯酰胺的合成及性能评价[J]. 山东大学学报(工学版), 2009, 39(3): 71 -76 .