Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (2): 58-70.doi: 10.6040/j.issn.1672-3961.0.2024.045

• Machine Learning & Data Mining • Previous Articles     Next Articles

Density peak clustering combining local truncation distance and small clusters merging

CHEN Sugen1,2, ZHAO Zhizhong1*   

  1. 1. School of Mathematics and Physics, Anqing Normal University, Anqing 246133, Anhui, China;
    2. Key Laboratory of Modeling, Simulation and Control of Complex Ecosystem in Dabie Mountains of Anhui Higher Education Institutes, Anqing 246133, Anhui, China
  • Published:2025-04-15

Abstract: Aiming at the problems that the truncation distance defined by the density peak clustering algorithm only considered the global distribution of samples and the "domino" phenomenon was easy to occur when assigning samples, a novel density peak clustering algorithm combining local truncation distance and small clusters merging was proposed. The truncation distance and local density of each sample were calculated based on the local distribution information of samples, which were conducive to accurately obtaining the density peaks on complex structure datasets. Potential density peaks were selected based on the difference between samples decision values and multiple small clusters were formed. A new kind of similarity between clusters was defined, and clusters were merged to obtain clustering results according to this similarity, which effectively avoided the "domino" phenomenon. Compared with several clustering algorithms on six synthetic datasets and eight UCI datasets, the standardized mutual information, adjusted rand index and adjusted mutual information average values of the proposed algorithm on 14 datasets were 18.15%, 28.99% and 20.22% higher than the five comparison algorithms on average, especially 30.06%, 47.15% and 31.90% higher than original density peak clustering algorithm. Experimental results showed the proposed algorithm had a good clustering effect.

Key words: clustering, density peak clustering, truncation distance, local density, potential density peaks

CLC Number: 

  • TP391
[1] JING Wenbo, JIN Tian, XIANG Deliang. Fast superpixel-based clustering algorithm for SAR image segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:1-5.
[2] 颜长春, 廖俊. 中国平台经济发展水平评价指标体系构建与测度[J]. 统计与决策, 2023, 39(11): 5-10. YAN Changchun, LIAO Jun. Platform of China economic development level evaluation index system construction and measure[J]. Journal of Statistics and Decision, 2023, 33(11): 5-10.
[3] ZHAO Yuan, FANG Zhaoyu, LIN Cuixiang, et al. RFCell: a gene selection approach for scRNA-seq clustering based on permutation and Random Forest[J]. Frontiers in Genetics, 2021, 12: 665843.
[4] GAO Tengfei, CHEN Dan, TANG Yunbo, et al. Adaptive density peaks clustering: towards exploratory EEG analysis[J]. Knowledge-Based Systems, 2022, 240: 108123.
[5] LI Chuanwei, CHEN Hongmei, LI Tianrui, et al. A stable community detection approach for complex network based on density peak clustering and label propagation[J]. Applied Intelligence, 2022, 52(2): 1188-1208.
[6] GUAN X, TERADA Y. Sparse kernel k-means for high-dimensional data[J]. Pattern Recognition, 2023, 144: 109873.
[7] WANG W, YANG J, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of 23rd International Conference on Very Large Data Bases. San Francisco, USA: ACM, 1997: 186-195.
[8] ANDREAS L, ERICH S. BETULA: fast clustering of large data with improved BIRCH CF-Trees[J]. Information Systems, 2022, 108: 101918.
[9] HUANG X G, MA T F, LIU C, et al. GriT-DBSCAN: a spatial clustering algorithm for very large databases[J]. Pattern Recognition, 2023, 142: 109658.
[10] FU Nan, NI Weiwei, HU Haibo, et al. Multidime-nsional grid-based clustering with local differential privacy[J]. Information Sciences, 2023, 623: 402-420.
[11] SUN Mingchen, YANG Mengduo, LI Yingji, et al. Structural-aware motif-based prompt tuning for graph clustering[J]. Information Sciences, 2023, 649: 119643.
[12] WU Chengmao, YU Dongxue. Generalized possibilistic c-means clustering with double weighting exponents[J]. Information Sciences, 2023, 645: 119283.
[13] RODRIGUZE A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
[14] WANG Y Z, QIAN J X, HASSAN M, et al. Density peak clustering algorithms: a review on the decade 2014—2023[J]. Expert Systems with Applications, 2024, 238: 121860.
[15] WEI Xiuxiu, PENG Maosong, HUANG Huajuan, et al. An overview on density peaks clustering[J]. Neurocomputing, 2023, 554: 126633.
[16] 陈叶旺, 申莲莲, 钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2): 378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394.
[17] DING Shifei, DU Wei, LI Chao, et al. Density peaks clustering algorithm based on improved similarity and allocation strategy[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(4): 1527-1542.
[18] ZHOU Zhou, SI Gangquan, SUN Haodong, et al. A robust clustering algorithm based on the identification of core points and KNN kernel density estimation[J]. Expert Systems with Applications, 2022, 195: 116573.
[19] YAN Huan, WANG Mingzhao, XIE Juanying. ANN-DPC: density peak clustering by finding the adaptive nearest neighbors[J]. Knowledge-Based Systems, 2024, 294: 111748.
[20] YU Donghua, LIU Guojun, GUO Maozu, et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7: 34301-34317.
[21] ZHAO J, WANG G, PAN J S, et al. Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets[J]. Pattern Recognition, 2023, 139: 109406.
[22] 赵嘉, 姚占峰, 吕莉, 等. 基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552. ZHAO Jia, YAO Zhanfeng, LÜ Li, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552.
[23] GUAN Junyi, LI Sheng, HE Xiongxiong, et al. Fast hierarchical clustering of local density peaks via an association degree transfer method[J]. Neuro-computing, 2021, 445: 401-418.
[24] GUO Wenjie, WANG Wenhai, ZHAO Shunping, et al. Density peak clustering with connectivity estimation[J]. Knowledge-Based Systems, 2022, 243: 108501.
[25] QIAO Kaikai, CHEN Jiawei, DUAN Shukai. Self-adaptive two-stage density clustering method with fuzzy connectivity[J]. Applied Soft Computing, 2024, 154: 111355.
[26] 赵嘉, 马清, 肖人彬,等. 面向流形数据的共享近邻密度峰值聚类算法[J]. 智能系统学报, 2023, 18(4): 719-730. ZHAO Jia, MA Qing, XIAO Renbin, et al. Shared neighbor density peak clustering algorithm for manifold data[J]. CAAI Transactions on Intelligent Systems, 2023, 18(4): 719-730.
[27] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Clustering with local density peaks-based minimum spanning tree[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(2): 374-387.
[28] QIU Teng, LI Yongjie. Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 4767-4780.
[29] MILAAN P. Clustering datasets[EB/OL].(2022-12-10)[2023-10-11]. https://github.com/milaan9/Clustering-Datasets
[30] BLAKE C L, MERZ C J. UCI repository of machine database[EB/OL].(2023-09-29)[2023-10-15]. https://archive.ics.uci.edu/datasets
[31] VINH N, EPPS J, BAILEY J. Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance[J]. Journal of Machine Learning Research, 2010, 11: 2837-2854.
[32] DONALD W Z, BRUNO D Z. Relative power of the Wilcoxon Test, the Friedman Test, and Repeated-Measures ANOVA on Ranks[J]. The Journal of Experimental Education, 1993, 62(1): 75-86.
[1] ZHU Hengdong, MA Yingcang, DAI Xuezhen. Adaptive semi-supervised neighborhood clustering algorithm [J]. Journal of Shandong University(Engineering Science), 2021, 51(4): 24-34.
[2] XIE Ziqi, WANG Lihong, LI Man. Active learning of pairwise constraints in block diagonal subspace clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 65-73.
[3] ZHU Changming, YUE Wen, WANG Panhong, SHEN Zhenyu, ZHOU Rigui. Global and local multi-view multi-label learning with active three-way clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 34-46.
[4] Bei LI,Song ZHAO,Zhijia XIE,Meng NIU. Electric vehicle virtual energy storage available capacity modeling [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 101-111.
[5] Jun QIN,Yuanpeng ZHANG,Yizhang JIANG,Wenlong HANG. Transfer fuzzy clustering based on self-constraint of multiple medoids [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 107-115.
[6] Xinyu DONG,Hanyue CHEN,Jiaguo LI,Qingyan MENG,Shihe XING,Liming ZHANG. An unsupervised color image segmentation method based on fusion of multiple methods [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 96-101.
[7] Yingxue ZHU,Ruizhang HUANG,Can MA. A short text dynamic clustering approach bias on new topic [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 8-18.
[8] Qiyue SONG, Xuewen MU, Huan CHENG. Segmentation of connected characters based on improved drop-fall algorithm [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 89-94.
[9] DU Xixi, LIU Huafeng, JING Liping. An additive co-clustering for recommendation of integrating social network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 96-102.
[10] YANG Tianpeng, XU Kunpeng, CHEN Lifei. Coefficient of variation clustering algorithm for non-uniform data [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 140-145.
[11] ZHANG Peirui, YANG Yan, XING Huanlai, YU Xiuying. Incremental multi-view clustering algorithm based on kernel K-means [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 48-53.
[12] PANG Renming, WANG Bo, YE Hao, ZHANG Haifeng, LI Mingliang. Clustering of blast furnace historical data based on PCA similarity factor and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 143-149.
[13] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[14] JI Xingquan, HAN Guozheng, LI Kejun, FU Rongrong, ZHU Yanghe. Application of improved K-means clustering algorithm based on density in distribution network block partitioning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 41-46.
[15] LI Shuo, SHI Yuliang. The method of spot cluster recommendation in location-based social networks [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 44-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] DENG Bin,WANG Jiang . Estimating parameters of a neuron model based on chaos synchronization and adaptive control[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 19 -23 .
[2] WANG Jin-ye,YAO Rui-ying,ZHANG Ji-liang,WANG Qi-jun . System stability control of fuzzy hyperbolic model[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(2): 63 -66 .
[3] XIA Bin,ZHANG Lian-jun . Energy comparison-based TOA estimation algorithm for the DS-CDMA UWB system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 70 -73 .
[4] YUAN Dong-ling,DENG Jian-xin,DING Ze-liang,DUAN Zhen-xing . Finite element analysis for the thermal residual stress of gradient CWS ceramic nozzles[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(2): 18 -22 .
[5] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[6] GAO Hou-Lei, TIAN Jia, DU Jiang, WU Zhi-Gang, LIU Chu-Min. Distributed generation—new technology in energy development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 106 -110 .
[7] YANG Yongshun1, WANG Lin2, GAO Xuechi1, JIA Haiqing3, WEI Jincheng2. Analysis of perpetual pavement strain distribution and fatigue damage[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 118 -124 .
[8] WANG Ru-gui,CAI Gan-wei . Sub-harmonic resonance analysis of 2-DOF controllable plane linkage mechanism electromechanical coupling system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 58 -63 .
[9] LIU Fei-hong, WANG Jian-ming*, YU Feng, ZHANG Gang. Numerical simulation for compressive residual stress of shot-peening based on SPH coupled FEM[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(6): 67 -71 .
[10] . Synthesis and evaluation of ternary copolymerization cationic polyacrylamide[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 71 -76 .