Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (2): 58-70.doi: 10.6040/j.issn.1672-3961.0.2024.045

• Machine Learning & Data Mining • Previous Articles     Next Articles

Density peak clustering combining local truncation distance and small clusters merging

CHEN Sugen1,2, ZHAO Zhizhong1*   

  1. 1. School of Mathematics and Physics, Anqing Normal University, Anqing 246133, Anhui, China;
    2. Key Laboratory of Modeling, Simulation and Control of Complex Ecosystem in Dabie Mountains of Anhui Higher Education Institutes, Anqing 246133, Anhui, China
  • Published:2025-04-15

Abstract: Aiming at the problems that the truncation distance defined by the density peak clustering algorithm only considered the global distribution of samples and the "domino" phenomenon was easy to occur when assigning samples, a novel density peak clustering algorithm combining local truncation distance and small clusters merging was proposed. The truncation distance and local density of each sample were calculated based on the local distribution information of samples, which were conducive to accurately obtaining the density peaks on complex structure datasets. Potential density peaks were selected based on the difference between samples decision values and multiple small clusters were formed. A new kind of similarity between clusters was defined, and clusters were merged to obtain clustering results according to this similarity, which effectively avoided the "domino" phenomenon. Compared with several clustering algorithms on six synthetic datasets and eight UCI datasets, the standardized mutual information, adjusted rand index and adjusted mutual information average values of the proposed algorithm on 14 datasets were 18.15%, 28.99% and 20.22% higher than the five comparison algorithms on average, especially 30.06%, 47.15% and 31.90% higher than original density peak clustering algorithm. Experimental results showed the proposed algorithm had a good clustering effect.

Key words: clustering, density peak clustering, truncation distance, local density, potential density peaks

CLC Number: 

  • TP391
[1] JING Wenbo, JIN Tian, XIANG Deliang. Fast superpixel-based clustering algorithm for SAR image segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:1-5.
[2] 颜长春, 廖俊. 中国平台经济发展水平评价指标体系构建与测度[J]. 统计与决策, 2023, 39(11): 5-10. YAN Changchun, LIAO Jun. Platform of China economic development level evaluation index system construction and measure[J]. Journal of Statistics and Decision, 2023, 33(11): 5-10.
[3] ZHAO Yuan, FANG Zhaoyu, LIN Cuixiang, et al. RFCell: a gene selection approach for scRNA-seq clustering based on permutation and Random Forest[J]. Frontiers in Genetics, 2021, 12: 665843.
[4] GAO Tengfei, CHEN Dan, TANG Yunbo, et al. Adaptive density peaks clustering: towards exploratory EEG analysis[J]. Knowledge-Based Systems, 2022, 240: 108123.
[5] LI Chuanwei, CHEN Hongmei, LI Tianrui, et al. A stable community detection approach for complex network based on density peak clustering and label propagation[J]. Applied Intelligence, 2022, 52(2): 1188-1208.
[6] GUAN X, TERADA Y. Sparse kernel k-means for high-dimensional data[J]. Pattern Recognition, 2023, 144: 109873.
[7] WANG W, YANG J, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of 23rd International Conference on Very Large Data Bases. San Francisco, USA: ACM, 1997: 186-195.
[8] ANDREAS L, ERICH S. BETULA: fast clustering of large data with improved BIRCH CF-Trees[J]. Information Systems, 2022, 108: 101918.
[9] HUANG X G, MA T F, LIU C, et al. GriT-DBSCAN: a spatial clustering algorithm for very large databases[J]. Pattern Recognition, 2023, 142: 109658.
[10] FU Nan, NI Weiwei, HU Haibo, et al. Multidime-nsional grid-based clustering with local differential privacy[J]. Information Sciences, 2023, 623: 402-420.
[11] SUN Mingchen, YANG Mengduo, LI Yingji, et al. Structural-aware motif-based prompt tuning for graph clustering[J]. Information Sciences, 2023, 649: 119643.
[12] WU Chengmao, YU Dongxue. Generalized possibilistic c-means clustering with double weighting exponents[J]. Information Sciences, 2023, 645: 119283.
[13] RODRIGUZE A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
[14] WANG Y Z, QIAN J X, HASSAN M, et al. Density peak clustering algorithms: a review on the decade 2014—2023[J]. Expert Systems with Applications, 2024, 238: 121860.
[15] WEI Xiuxiu, PENG Maosong, HUANG Huajuan, et al. An overview on density peaks clustering[J]. Neurocomputing, 2023, 554: 126633.
[16] 陈叶旺, 申莲莲, 钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2): 378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394.
[17] DING Shifei, DU Wei, LI Chao, et al. Density peaks clustering algorithm based on improved similarity and allocation strategy[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(4): 1527-1542.
[18] ZHOU Zhou, SI Gangquan, SUN Haodong, et al. A robust clustering algorithm based on the identification of core points and KNN kernel density estimation[J]. Expert Systems with Applications, 2022, 195: 116573.
[19] YAN Huan, WANG Mingzhao, XIE Juanying. ANN-DPC: density peak clustering by finding the adaptive nearest neighbors[J]. Knowledge-Based Systems, 2024, 294: 111748.
[20] YU Donghua, LIU Guojun, GUO Maozu, et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7: 34301-34317.
[21] ZHAO J, WANG G, PAN J S, et al. Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets[J]. Pattern Recognition, 2023, 139: 109406.
[22] 赵嘉, 姚占峰, 吕莉, 等. 基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552. ZHAO Jia, YAO Zhanfeng, LÜ Li, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552.
[23] GUAN Junyi, LI Sheng, HE Xiongxiong, et al. Fast hierarchical clustering of local density peaks via an association degree transfer method[J]. Neuro-computing, 2021, 445: 401-418.
[24] GUO Wenjie, WANG Wenhai, ZHAO Shunping, et al. Density peak clustering with connectivity estimation[J]. Knowledge-Based Systems, 2022, 243: 108501.
[25] QIAO Kaikai, CHEN Jiawei, DUAN Shukai. Self-adaptive two-stage density clustering method with fuzzy connectivity[J]. Applied Soft Computing, 2024, 154: 111355.
[26] 赵嘉, 马清, 肖人彬,等. 面向流形数据的共享近邻密度峰值聚类算法[J]. 智能系统学报, 2023, 18(4): 719-730. ZHAO Jia, MA Qing, XIAO Renbin, et al. Shared neighbor density peak clustering algorithm for manifold data[J]. CAAI Transactions on Intelligent Systems, 2023, 18(4): 719-730.
[27] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Clustering with local density peaks-based minimum spanning tree[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(2): 374-387.
[28] QIU Teng, LI Yongjie. Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 4767-4780.
[29] MILAAN P. Clustering datasets[EB/OL].(2022-12-10)[2023-10-11]. https://github.com/milaan9/Clustering-Datasets
[30] BLAKE C L, MERZ C J. UCI repository of machine database[EB/OL].(2023-09-29)[2023-10-15]. https://archive.ics.uci.edu/datasets
[31] VINH N, EPPS J, BAILEY J. Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance[J]. Journal of Machine Learning Research, 2010, 11: 2837-2854.
[32] DONALD W Z, BRUNO D Z. Relative power of the Wilcoxon Test, the Friedman Test, and Repeated-Measures ANOVA on Ranks[J]. The Journal of Experimental Education, 1993, 62(1): 75-86.
[1] LI Xiaohui, LIU Xiaofei, SUN Weitong, ZHAO Yi, DONG Yuan, JIN Yinli. An inspection task assignment and path planning algorithm based on vehicles-UAVs collaboration [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 101-109.
[2] ZHU Hengdong, MA Yingcang, DAI Xuezhen. Adaptive semi-supervised neighborhood clustering algorithm [J]. Journal of Shandong University(Engineering Science), 2021, 51(4): 24-34.
[3] ZHU Changming, YUE Wen, WANG Panhong, SHEN Zhenyu, ZHOU Rigui. Global and local multi-view multi-label learning with active three-way clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 34-46.
[4] XIE Ziqi, WANG Lihong, LI Man. Active learning of pairwise constraints in block diagonal subspace clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 65-73.
[5] Bei LI,Song ZHAO,Zhijia XIE,Meng NIU. Electric vehicle virtual energy storage available capacity modeling [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 101-111.
[6] Xinyu DONG,Hanyue CHEN,Jiaguo LI,Qingyan MENG,Shihe XING,Liming ZHANG. An unsupervised color image segmentation method based on fusion of multiple methods [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 96-101.
[7] Jun QIN,Yuanpeng ZHANG,Yizhang JIANG,Wenlong HANG. Transfer fuzzy clustering based on self-constraint of multiple medoids [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 107-115.
[8] Yingxue ZHU,Ruizhang HUANG,Can MA. A short text dynamic clustering approach bias on new topic [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 8-18.
[9] Qiyue SONG, Xuewen MU, Huan CHENG. Segmentation of connected characters based on improved drop-fall algorithm [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 89-94.
[10] ZHANG Peirui, YANG Yan, XING Huanlai, YU Xiuying. Incremental multi-view clustering algorithm based on kernel K-means [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 48-53.
[11] DU Xixi, LIU Huafeng, JING Liping. An additive co-clustering for recommendation of integrating social network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 96-102.
[12] YANG Tianpeng, XU Kunpeng, CHEN Lifei. Coefficient of variation clustering algorithm for non-uniform data [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 140-145.
[13] PANG Renming, WANG Bo, YE Hao, ZHANG Haifeng, LI Mingliang. Clustering of blast furnace historical data based on PCA similarity factor and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 143-149.
[14] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[15] JI Xingquan, HAN Guozheng, LI Kejun, FU Rongrong, ZHU Yanghe. Application of improved K-means clustering algorithm based on density in distribution network block partitioning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 41-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Pei,ZHANG Yanning,SHEN Jiazhen,LIU Juncheng, . Application of information measure and support vector machine in image edge detection[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(3): 95 -99 .
[2] SUN Yuan-Yuan, XU Yan-Liang, TAO Zhi-Ning. Analysis and calculation of the braking force for a side magnetism brake single phase induction motor[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 120 -123 .
[3] MENG Jian, LI Yibin, LI Bin. Bound gait controlling method of quadruped robot[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 28 -34 .
[4] XUE Hongtao,TIAN Guohui,LI Xiaolei,LU Fei . Application of the QR Code for various object identificationand manipulation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(6): 25 -30 .
[5] BEI Guang-xia,LOU Pei-huang,YE Wen-hua,WANG Xiao-mei . On-machine inspecting cylindricity error of precision machining[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 65 -67 .
[6] WANG Bai-wei,CAO Sheng-le . A mult-objective assessment method of the effects of industrial waste-water management[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 89 -92 .
[7] . The magnetic glass state in the magnetocaloric material Gd5Ge4[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 67 -70 .
[8] WANG Huiqing, SUN Hongwei, ZHANG Jianhui. Time series similarity searching algorithm based on Map/Reduce[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 15 -21 .
[9] LIAO Huo-mu,DONG Zeng-chuan,SHU Long-cang,YUN Ru-an . Combinative time series analysis method for the prediction  of the groundwater level[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(2): 96 -100 .
[10] WANG Yan-qing,QIAN Cheng-shan,JIANG Chang-sheng . A delay-dependent robust stability criterion for uncertainneutral systems with time-varying delay[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(1): 116 -120 .