您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 58-70.doi: 10.6040/j.issn.1672-3961.0.2024.045

• 机器学习与数据挖掘 • 上一篇    下一篇

融合局部截断距离及小簇合并的密度峰值聚类

陈素根1,2,赵志忠1*   

  1. 1.安庆师范大学数理学院, 安徽 安庆 246133;2.安徽省大别山区域复杂生态系统建模、仿真与控制重点实验室, 安徽 安庆 246133
  • 发布日期:2025-04-15
  • 作者简介:陈素根(1982—),男,安徽当涂人,教授,硕士生导师,博士,主要研究方向为模式识别、机器学习等. E-mail: chensugen@126.com. *通信作者简介:赵志忠(1999—),男,安徽淮南人,硕士研究生,主要研究方向为模式识别、机器学习. E-mail: zhaozz1999@126.com
  • 基金资助:
    国家自然科学基金青年基金资助项目(61702012);安徽省自然科学基金面上资助项目(2008085MF193);安徽省高等学校科学研究重点资助项目(2024AH051095)

Density peak clustering combining local truncation distance and small clusters merging

CHEN Sugen1,2, ZHAO Zhizhong1*   

  1. 1. School of Mathematics and Physics, Anqing Normal University, Anqing 246133, Anhui, China;
    2. Key Laboratory of Modeling, Simulation and Control of Complex Ecosystem in Dabie Mountains of Anhui Higher Education Institutes, Anqing 246133, Anhui, China
  • Published:2025-04-15

摘要: 针对密度峰值聚类算法定义的截断距离仅考虑样本全局分布,在样本分配时容易产生“多米诺骨牌”现象等问题,提出一种融合局部截断距离及小簇合并的密度峰值聚类算法。基于样本局部分布信息计算每个样本截断距离和局部密度,有利于准确获得复杂结构数据集上密度峰;根据样本决策值之间差值关系选择潜在密度峰并形成多个小簇;定义一种新的小簇间相似度,根据此相似度将小簇合并获得聚类结果,有效避免了“多米诺骨牌”现象。采用6个人工数据集和8个UCI数据集进行验证,所提算法在上述14个数据集上的标准化互信息、调整兰德系数和调整互信息平均值比5个对比算法平均提高18.15%、28.99%和20.22%,比原始密度峰值聚类算法提高30.06%,47.15%和31.90%,具有较好的聚类效果。

关键词: 聚类, 密度峰值聚类, 截断距离, 局部密度, 潜在密度峰

Abstract: Aiming at the problems that the truncation distance defined by the density peak clustering algorithm only considered the global distribution of samples and the "domino" phenomenon was easy to occur when assigning samples, a novel density peak clustering algorithm combining local truncation distance and small clusters merging was proposed. The truncation distance and local density of each sample were calculated based on the local distribution information of samples, which were conducive to accurately obtaining the density peaks on complex structure datasets. Potential density peaks were selected based on the difference between samples decision values and multiple small clusters were formed. A new kind of similarity between clusters was defined, and clusters were merged to obtain clustering results according to this similarity, which effectively avoided the "domino" phenomenon. Compared with several clustering algorithms on six synthetic datasets and eight UCI datasets, the standardized mutual information, adjusted rand index and adjusted mutual information average values of the proposed algorithm on 14 datasets were 18.15%, 28.99% and 20.22% higher than the five comparison algorithms on average, especially 30.06%, 47.15% and 31.90% higher than original density peak clustering algorithm. Experimental results showed the proposed algorithm had a good clustering effect.

Key words: clustering, density peak clustering, truncation distance, local density, potential density peaks

中图分类号: 

  • TP391
[1] JING Wenbo, JIN Tian, XIANG Deliang. Fast superpixel-based clustering algorithm for SAR image segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:1-5.
[2] 颜长春, 廖俊. 中国平台经济发展水平评价指标体系构建与测度[J]. 统计与决策, 2023, 39(11): 5-10. YAN Changchun, LIAO Jun. Platform of China economic development level evaluation index system construction and measure[J]. Journal of Statistics and Decision, 2023, 33(11): 5-10.
[3] ZHAO Yuan, FANG Zhaoyu, LIN Cuixiang, et al. RFCell: a gene selection approach for scRNA-seq clustering based on permutation and Random Forest[J]. Frontiers in Genetics, 2021, 12: 665843.
[4] GAO Tengfei, CHEN Dan, TANG Yunbo, et al. Adaptive density peaks clustering: towards exploratory EEG analysis[J]. Knowledge-Based Systems, 2022, 240: 108123.
[5] LI Chuanwei, CHEN Hongmei, LI Tianrui, et al. A stable community detection approach for complex network based on density peak clustering and label propagation[J]. Applied Intelligence, 2022, 52(2): 1188-1208.
[6] GUAN X, TERADA Y. Sparse kernel k-means for high-dimensional data[J]. Pattern Recognition, 2023, 144: 109873.
[7] WANG W, YANG J, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of 23rd International Conference on Very Large Data Bases. San Francisco, USA: ACM, 1997: 186-195.
[8] ANDREAS L, ERICH S. BETULA: fast clustering of large data with improved BIRCH CF-Trees[J]. Information Systems, 2022, 108: 101918.
[9] HUANG X G, MA T F, LIU C, et al. GriT-DBSCAN: a spatial clustering algorithm for very large databases[J]. Pattern Recognition, 2023, 142: 109658.
[10] FU Nan, NI Weiwei, HU Haibo, et al. Multidime-nsional grid-based clustering with local differential privacy[J]. Information Sciences, 2023, 623: 402-420.
[11] SUN Mingchen, YANG Mengduo, LI Yingji, et al. Structural-aware motif-based prompt tuning for graph clustering[J]. Information Sciences, 2023, 649: 119643.
[12] WU Chengmao, YU Dongxue. Generalized possibilistic c-means clustering with double weighting exponents[J]. Information Sciences, 2023, 645: 119283.
[13] RODRIGUZE A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
[14] WANG Y Z, QIAN J X, HASSAN M, et al. Density peak clustering algorithms: a review on the decade 2014—2023[J]. Expert Systems with Applications, 2024, 238: 121860.
[15] WEI Xiuxiu, PENG Maosong, HUANG Huajuan, et al. An overview on density peaks clustering[J]. Neurocomputing, 2023, 554: 126633.
[16] 陈叶旺, 申莲莲, 钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2): 378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394.
[17] DING Shifei, DU Wei, LI Chao, et al. Density peaks clustering algorithm based on improved similarity and allocation strategy[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(4): 1527-1542.
[18] ZHOU Zhou, SI Gangquan, SUN Haodong, et al. A robust clustering algorithm based on the identification of core points and KNN kernel density estimation[J]. Expert Systems with Applications, 2022, 195: 116573.
[19] YAN Huan, WANG Mingzhao, XIE Juanying. ANN-DPC: density peak clustering by finding the adaptive nearest neighbors[J]. Knowledge-Based Systems, 2024, 294: 111748.
[20] YU Donghua, LIU Guojun, GUO Maozu, et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7: 34301-34317.
[21] ZHAO J, WANG G, PAN J S, et al. Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets[J]. Pattern Recognition, 2023, 139: 109406.
[22] 赵嘉, 姚占峰, 吕莉, 等. 基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552. ZHAO Jia, YAO Zhanfeng, LÜ Li, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552.
[23] GUAN Junyi, LI Sheng, HE Xiongxiong, et al. Fast hierarchical clustering of local density peaks via an association degree transfer method[J]. Neuro-computing, 2021, 445: 401-418.
[24] GUO Wenjie, WANG Wenhai, ZHAO Shunping, et al. Density peak clustering with connectivity estimation[J]. Knowledge-Based Systems, 2022, 243: 108501.
[25] QIAO Kaikai, CHEN Jiawei, DUAN Shukai. Self-adaptive two-stage density clustering method with fuzzy connectivity[J]. Applied Soft Computing, 2024, 154: 111355.
[26] 赵嘉, 马清, 肖人彬,等. 面向流形数据的共享近邻密度峰值聚类算法[J]. 智能系统学报, 2023, 18(4): 719-730. ZHAO Jia, MA Qing, XIAO Renbin, et al. Shared neighbor density peak clustering algorithm for manifold data[J]. CAAI Transactions on Intelligent Systems, 2023, 18(4): 719-730.
[27] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Clustering with local density peaks-based minimum spanning tree[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(2): 374-387.
[28] QIU Teng, LI Yongjie. Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 4767-4780.
[29] MILAAN P. Clustering datasets[EB/OL].(2022-12-10)[2023-10-11]. https://github.com/milaan9/Clustering-Datasets
[30] BLAKE C L, MERZ C J. UCI repository of machine database[EB/OL].(2023-09-29)[2023-10-15]. https://archive.ics.uci.edu/datasets
[31] VINH N, EPPS J, BAILEY J. Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance[J]. Journal of Machine Learning Research, 2010, 11: 2837-2854.
[32] DONALD W Z, BRUNO D Z. Relative power of the Wilcoxon Test, the Friedman Test, and Repeated-Measures ANOVA on Ranks[J]. The Journal of Experimental Education, 1993, 62(1): 75-86.
[1] 李晓辉,刘小飞,孙炜桐,赵毅,董媛,靳引利. 基于车辆与无人机协同的巡检任务分配与路径规划算法[J]. 山东大学学报 (工学版), 2025, 55(5): 101-109.
[2] 王梅,宋凯文,刘勇,王志宝,万达. DMKK-means——一种深度多核K-means聚类算法[J]. 山东大学学报 (工学版), 2024, 54(6): 1-7.
[3] 王丽娟,徐晓,丁世飞. 面向密度峰值聚类的高效相似度度量[J]. 山东大学学报 (工学版), 2024, 54(3): 12-21.
[4] 张鑫,费可可. 基于log鲁棒核岭回归的子空间聚类算法[J]. 山东大学学报 (工学版), 2023, 53(6): 26-34.
[5] 李兆彬,叶军,周浩岩,卢岚,谢立. 变异萤火虫优化的粗糙K-均值聚类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 74-82.
[6] 侯延琛,赵金东. 任意形状聚类的SPK-means算法[J]. 山东大学学报 (工学版), 2023, 53(2): 87-92.
[7] 程业超,刘惊雷. 自适应图正则的单步子空间聚类[J]. 山东大学学报 (工学版), 2022, 52(2): 57-66.
[8] 卢建云,张蔚,李林. 一种基于动态局部密度和聚类结构的聚类算法[J]. 山东大学学报 (工学版), 2022, 52(2): 118-127.
[9] 孟银凤,杨佳宇,曹付元. 函数型数据的分裂转移式层次聚类算法[J]. 山东大学学报 (工学版), 2022, 52(1): 19-27.
[10] 朱恒东, 马盈仓, 代雪珍. 自适应半监督邻域聚类算法[J]. 山东大学学报 (工学版), 2021, 51(4): 24-34.
[11] 朱昌明,岳闻,王盼红,沈震宇,周日贵. 主动三支聚类下的全局和局部多视角多标签学习算法[J]. 山东大学学报 (工学版), 2021, 51(2): 34-46.
[12] 解子奇,王立宏,李嫚. 块对角子空间聚类中成对约束的主动式学习[J]. 山东大学学报 (工学版), 2021, 51(2): 65-73.
[13] 李蓓,赵松,谢志佳,牛萌. 电动汽车虚拟储能可用容量建模[J]. 山东大学学报 (工学版), 2020, 50(6): 101-111.
[14] 董新宇,陈瀚阅,李家国,孟庆岩,邢世和,张黎明. 基于多方法融合的非监督彩色图像分割[J]. 山东大学学报 (工学版), 2019, 49(2): 96-101.
[15] 秦军,张远鹏,蒋亦樟,杭文龙. 多代表点自约束的模糊迁移聚类[J]. 山东大学学报 (工学版), 2019, 49(2): 107-115.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王,张艳宁,申家振,刘俊成 . 基于信息测度和支持向量机的图像边缘检测[J]. 山东大学学报(工学版), 2006, 36(3): 95 -99 .
[2] 孙媛媛 徐衍亮 姚之宁. 旁磁制动单相感应电动机制动力的分析与计算[J]. 山东大学学报(工学版), 2009, 39(5): 120 -123 .
[3] 孟健, 李贻斌, 李彬. 四足机器人跳跃步态控制方法[J]. 山东大学学报(工学版), 2015, 45(3): 28 -34 .
[4] 穴洪涛,田国会,李晓磊,路飞 . QR Code在多种类物体识别与操作中的应用[J]. 山东大学学报(工学版), 2007, 37(6): 25 -30 .
[5] 贝广霞,楼佩煌,叶文华,王晓梅 . 精密加工中圆柱度在机检测关键技术[J]. 山东大学学报(工学版), 2007, 37(5): 65 -67 .
[6] 王佰伟,曹升乐 . 工业废水治理效果多目标评价方法研究[J]. 山东大学学报(工学版), 2007, 37(3): 89 -92 .
[7] 董彤 袁淑娟 葛军饴 洪芳 郁黎明 曹世勋 张金仓. 磁制冷材料Gd5Ge4中的磁玻璃态[J]. 山东大学学报(工学版), 2009, 39(3): 67 -70 .
[8] 王会青,孙宏伟,张建辉. 基于Map/Reduce的时间序列相似性搜索算法[J]. 山东大学学报(工学版), 2016, 46(1): 15 -21 .
[9] 廖伙木,董增川, 束龙仓,贠汝安 . 地下水位预报中的组合时间序列分析法[J]. 山东大学学报(工学版), 2008, 38(2): 96 -100 .
[10] 王岩青,钱承山,姜长生 . 中立型不确定变时滞系统的一个时滞相关鲁棒稳定性判据[J]. 山东大学学报(工学版), 2008, 38(1): 116 -120 .