山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 58-70.doi: 10.6040/j.issn.1672-3961.0.2024.045
• 机器学习与数据挖掘 • 上一篇
陈素根1,2,赵志忠1*
CHEN Sugen1,2, ZHAO Zhizhong1*
摘要: 针对密度峰值聚类算法定义的截断距离仅考虑样本全局分布,在样本分配时容易产生“多米诺骨牌”现象等问题,提出一种融合局部截断距离及小簇合并的密度峰值聚类算法。基于样本局部分布信息计算每个样本截断距离和局部密度,有利于准确获得复杂结构数据集上密度峰;根据样本决策值之间差值关系选择潜在密度峰并形成多个小簇;定义一种新的小簇间相似度,根据此相似度将小簇合并获得聚类结果,有效避免了“多米诺骨牌”现象。采用6个人工数据集和8个UCI数据集进行验证,所提算法在上述14个数据集上的标准化互信息、调整兰德系数和调整互信息平均值比5个对比算法平均提高18.15%、28.99%和20.22%,比原始密度峰值聚类算法提高30.06%,47.15%和31.90%,具有较好的聚类效果。
中图分类号:
[1] JING Wenbo, JIN Tian, XIANG Deliang. Fast superpixel-based clustering algorithm for SAR image segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:1-5. [2] 颜长春, 廖俊. 中国平台经济发展水平评价指标体系构建与测度[J]. 统计与决策, 2023, 39(11): 5-10. YAN Changchun, LIAO Jun. Platform of China economic development level evaluation index system construction and measure[J]. Journal of Statistics and Decision, 2023, 33(11): 5-10. [3] ZHAO Yuan, FANG Zhaoyu, LIN Cuixiang, et al. RFCell: a gene selection approach for scRNA-seq clustering based on permutation and Random Forest[J]. Frontiers in Genetics, 2021, 12: 665843. [4] GAO Tengfei, CHEN Dan, TANG Yunbo, et al. Adaptive density peaks clustering: towards exploratory EEG analysis[J]. Knowledge-Based Systems, 2022, 240: 108123. [5] LI Chuanwei, CHEN Hongmei, LI Tianrui, et al. A stable community detection approach for complex network based on density peak clustering and label propagation[J]. Applied Intelligence, 2022, 52(2): 1188-1208. [6] GUAN X, TERADA Y. Sparse kernel k-means for high-dimensional data[J]. Pattern Recognition, 2023, 144: 109873. [7] WANG W, YANG J, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of 23rd International Conference on Very Large Data Bases. San Francisco, USA: ACM, 1997: 186-195. [8] ANDREAS L, ERICH S. BETULA: fast clustering of large data with improved BIRCH CF-Trees[J]. Information Systems, 2022, 108: 101918. [9] HUANG X G, MA T F, LIU C, et al. GriT-DBSCAN: a spatial clustering algorithm for very large databases[J]. Pattern Recognition, 2023, 142: 109658. [10] FU Nan, NI Weiwei, HU Haibo, et al. Multidime-nsional grid-based clustering with local differential privacy[J]. Information Sciences, 2023, 623: 402-420. [11] SUN Mingchen, YANG Mengduo, LI Yingji, et al. Structural-aware motif-based prompt tuning for graph clustering[J]. Information Sciences, 2023, 649: 119643. [12] WU Chengmao, YU Dongxue. Generalized possibilistic c-means clustering with double weighting exponents[J]. Information Sciences, 2023, 645: 119283. [13] RODRIGUZE A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496. [14] WANG Y Z, QIAN J X, HASSAN M, et al. Density peak clustering algorithms: a review on the decade 2014—2023[J]. Expert Systems with Applications, 2024, 238: 121860. [15] WEI Xiuxiu, PENG Maosong, HUANG Huajuan, et al. An overview on density peaks clustering[J]. Neurocomputing, 2023, 554: 126633. [16] 陈叶旺, 申莲莲, 钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2): 378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394. [17] DING Shifei, DU Wei, LI Chao, et al. Density peaks clustering algorithm based on improved similarity and allocation strategy[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(4): 1527-1542. [18] ZHOU Zhou, SI Gangquan, SUN Haodong, et al. A robust clustering algorithm based on the identification of core points and KNN kernel density estimation[J]. Expert Systems with Applications, 2022, 195: 116573. [19] YAN Huan, WANG Mingzhao, XIE Juanying. ANN-DPC: density peak clustering by finding the adaptive nearest neighbors[J]. Knowledge-Based Systems, 2024, 294: 111748. [20] YU Donghua, LIU Guojun, GUO Maozu, et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7: 34301-34317. [21] ZHAO J, WANG G, PAN J S, et al. Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets[J]. Pattern Recognition, 2023, 139: 109406. [22] 赵嘉, 姚占峰, 吕莉, 等. 基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552. ZHAO Jia, YAO Zhanfeng, LÜ Li, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552. [23] GUAN Junyi, LI Sheng, HE Xiongxiong, et al. Fast hierarchical clustering of local density peaks via an association degree transfer method[J]. Neuro-computing, 2021, 445: 401-418. [24] GUO Wenjie, WANG Wenhai, ZHAO Shunping, et al. Density peak clustering with connectivity estimation[J]. Knowledge-Based Systems, 2022, 243: 108501. [25] QIAO Kaikai, CHEN Jiawei, DUAN Shukai. Self-adaptive two-stage density clustering method with fuzzy connectivity[J]. Applied Soft Computing, 2024, 154: 111355. [26] 赵嘉, 马清, 肖人彬,等. 面向流形数据的共享近邻密度峰值聚类算法[J]. 智能系统学报, 2023, 18(4): 719-730. ZHAO Jia, MA Qing, XIAO Renbin, et al. Shared neighbor density peak clustering algorithm for manifold data[J]. CAAI Transactions on Intelligent Systems, 2023, 18(4): 719-730. [27] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Clustering with local density peaks-based minimum spanning tree[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(2): 374-387. [28] QIU Teng, LI Yongjie. Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 4767-4780. [29] MILAAN P. Clustering datasets[EB/OL].(2022-12-10)[2023-10-11]. https://github.com/milaan9/Clustering-Datasets [30] BLAKE C L, MERZ C J. UCI repository of machine database[EB/OL].(2023-09-29)[2023-10-15]. https://archive.ics.uci.edu/datasets [31] VINH N, EPPS J, BAILEY J. Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance[J]. Journal of Machine Learning Research, 2010, 11: 2837-2854. [32] DONALD W Z, BRUNO D Z. Relative power of the Wilcoxon Test, the Friedman Test, and Repeated-Measures ANOVA on Ranks[J]. The Journal of Experimental Education, 1993, 62(1): 75-86. |
[1] | 王梅,宋凯文,刘勇,王志宝,万达. DMKK-means——一种深度多核K-means聚类算法[J]. 山东大学学报 (工学版), 2024, 54(6): 1-7. |
[2] | 王丽娟,徐晓,丁世飞. 面向密度峰值聚类的高效相似度度量[J]. 山东大学学报 (工学版), 2024, 54(3): 12-21. |
[3] | 张鑫,费可可. 基于log鲁棒核岭回归的子空间聚类算法[J]. 山东大学学报 (工学版), 2023, 53(6): 26-34. |
[4] | 李兆彬,叶军,周浩岩,卢岚,谢立. 变异萤火虫优化的粗糙K-均值聚类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 74-82. |
[5] | 侯延琛,赵金东. 任意形状聚类的SPK-means算法[J]. 山东大学学报 (工学版), 2023, 53(2): 87-92. |
[6] | 卢建云,张蔚,李林. 一种基于动态局部密度和聚类结构的聚类算法[J]. 山东大学学报 (工学版), 2022, 52(2): 118-127. |
[7] | 程业超,刘惊雷. 自适应图正则的单步子空间聚类[J]. 山东大学学报 (工学版), 2022, 52(2): 57-66. |
[8] | 孟银凤,杨佳宇,曹付元. 函数型数据的分裂转移式层次聚类算法[J]. 山东大学学报 (工学版), 2022, 52(1): 19-27. |
[9] | 朱恒东, 马盈仓, 代雪珍. 自适应半监督邻域聚类算法[J]. 山东大学学报 (工学版), 2021, 51(4): 24-34. |
[10] | 解子奇,王立宏,李嫚. 块对角子空间聚类中成对约束的主动式学习[J]. 山东大学学报 (工学版), 2021, 51(2): 65-73. |
[11] | 朱昌明,岳闻,王盼红,沈震宇,周日贵. 主动三支聚类下的全局和局部多视角多标签学习算法[J]. 山东大学学报 (工学版), 2021, 51(2): 34-46. |
[12] | 李蓓,赵松,谢志佳,牛萌. 电动汽车虚拟储能可用容量建模[J]. 山东大学学报 (工学版), 2020, 50(6): 101-111. |
[13] | 秦军,张远鹏,蒋亦樟,杭文龙. 多代表点自约束的模糊迁移聚类[J]. 山东大学学报 (工学版), 2019, 49(2): 107-115. |
[14] | 董新宇,陈瀚阅,李家国,孟庆岩,邢世和,张黎明. 基于多方法融合的非监督彩色图像分割[J]. 山东大学学报 (工学版), 2019, 49(2): 96-101. |
[15] | 朱映雪,黄瑞章,马灿. 一种具有新主题偏向性的短文本动态聚类方法[J]. 山东大学学报 (工学版), 2018, 48(6): 8-18. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 5
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 18
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Cited |
|
|||||||||||||||||||||||||||||||||||||||||||||
Shared | ||||||||||||||||||||||||||||||||||||||||||||||
Discussed |
|