面向密度峰值聚类的高效相似度度量

doi:10.6040/j.issn.1672-3961.0.2023.107

山东大学学报 (工学版) ›› 2024, Vol. 54 ›› Issue (3): 12-21.doi: 10.6040/j.issn.1672-3961.0.2023.107

• 机器学习与数据挖掘 • 上一篇

面向密度峰值聚类的高效相似度度量

王丽娟^1,2,徐晓^1*,丁世飞¹

1.中国矿业大学计算机科学技术学院, 江苏徐州 221116;2.徐州工业职业技术学院信息工程学院, 江苏徐州 221114

发布日期:2024-06-28
作者简介:王丽娟(1981— )女,江苏赣榆人,博士研究生,主要研究方向为机器学习和聚类分析. E-mail:327732566@qq.com. *通信作者简介:徐晓(1992— )女,江苏海门人,讲师,硕士生导师,博士,主要研究方向为数据挖掘和机器学习. E-mail:xu_xiao@cumt.edu.cn
基金资助:
国家自然科学基金资助项目(62206296);中央高校基本科研业务费专项资金资助项目(2022QN1095);江苏省高等职业院校专业带头人高端研修资助项目(2022GRFX063)

Efficient similarity measure for density peaks clustering

WANG Lijuan^1,2, XU Xiao^1*, DING Shifei¹

1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China;
2. School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou 221114, Jiangsu, China

Published:2024-06-28

摘要/Abstract

摘要： 针对密度峰值聚类(density peaks clustering, DPC)计算复杂度高的问题,提出一种面向密度峰值聚类的高效相似度度量(efficient similarity measure, ESM)法,通过仅度量最近邻之间的相似度构建不完全相似度矩阵。最近邻的选择基于一个随机第三方数据对象,无需另外引入参数。基于ESM法构建相似度矩阵,提出一种改进的高效密度峰值聚类(efficient density peaks clustering, EDPC)算法,在保持准确率的同时提高DPC识别聚类中心的效率。理论分析和试验结果表明,ESM法通过减少一定不相似的相似度,可以有效提高DPC及其改进算法基于K最近邻的密度峰值聚类(density peaks clustering based on K-nearest neighbors, DPC-KNN)和模糊加权K最近邻密度峰值聚类(fuzzy weighted K-nearest neighbors density peaks clustering, FKNN-DPC)的计算效率,具有较强的可扩展性。

关键词: 密度峰值聚类, 聚类中心, 相似度矩阵, 计算复杂度, 大规模数据集

中图分类号:

TP391

王丽娟,徐晓,丁世飞. 面向密度峰值聚类的高效相似度度量[J]. 山东大学学报 (工学版), 2024, 54(3): 12-21.

WANG Lijuan, XU Xiao, DING Shifei. Efficient similarity measure for density peaks clustering[J]. Journal of Shandong University(Engineering Science), 2024, 54(3): 12-21.

参考文献

[1] CHEN J G, PHILIP S Y. A domain adaptive density clustering algorithm for data with varying density distribution[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6):2310-2321.
[2] 贾洪杰, 丁世飞, 史忠植. 求解大规模谱聚类的近似加权核k-means算法[J]. 软件学报, 2015, 26(11):2836-2846. JIA Hongjie, DING Shifei, SHI Zhongzhi. Approximate weighted kernel k-means for large-scale spectral clustering[J]. Journal of Software, 2015, 26(11):2836-2846.
[3] YAN X Q, YE Y D, QIU X Y, et al. Synergetic information bottleneck for joint multi-view and ensemble clustering[J]. Information Fusion, 2020, 56:15-27.
[4] XIE D Y, GAO Q X, WANG Q Q, et al. Adaptive latent similarity learning for multi-view clustering[J]. Neural Networks, 2020, 121:409-418.
[5] QU H, MA T, TONG X Y, et al. Clustering by centroid drift and boundary shrinkage[J]. Pattern Recognition, 2022, 129:108745.
[6] BARANWAL M, SALAPAKA S. Clustering and supervisory voltage control in power systems[J]. International Journal of Electrical Power & Energy Systems, 2019, 109:641-651.
[7] POTHULA K, SMYRNOVA D, SCHRÖDER G. Clustering cryo-EM images of helical protein polymers for helical reconstructions[J]. Ultramicroscopy, 2019, 203:132-138.
[8] SHI Y C, OTTO C, JAIN A. Face clustering: representation and pairwise constraints[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(7):1626-1640.
[9] LLOBELL F, VIGNEAU E, QANNARI E. Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics[J]. Food Quality and Preference, 2019, 75:97-104.
[10] 史倩玉, 梁吉业, 赵兴旺. 一种不完备混合数据集成聚类算法[J]. 计算机研究与发展, 2016, 53(9):1979-1989. SHI Qianyu, LIANG Jiye, ZHAO Xingwang. A clustering ensemble algorithm for incomplete mixed data[J]. Journal of Computer Research and Development, 2016, 53(9):1979-1989.
[11] CHEN Y W, TANG S Y, BOUGUILA N, et al. A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data[J]. Pattern Recognition, 2018, 83:375-387.
[12] BRYANT A, CIOS K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6):1109-1121.
[13] RODRÍGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[14] DU M J, DING S F, JIA H J. Study on density peaks clustering based on K-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99:135-145.
[15] XIE J Y, GAO H C, XIE W X, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354:19-40.
[16] LIU R, WANG H, YU X M. Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450:200-226.
[17] DING S F, DU W, XU X, et al. An improved density peaks clustering algorithm based on natural neighbor with a merging strategy[J]. Information Sciences, 2023, 624:252-276.
[18] WANG M, MIN F, ZHANG Z H, et al. Active learning through density clustering[J]. Expert Systems with Applications, 2017, 85:305-317.
[19] XU J, WANG G Y, LI T R, et al. Fat node leading tree for data stream clustering with density peaks[J]. Knowledge-Based Systems, 2017, 120:99-117.
[20] XU J, WANG G Y, DENG W H. DenPEHC:density peak based efficient hierarchical clustering[J]. Infor-mation Sciences, 2016, 373:200-218.
[21] WU B, WILAMOWSKI B. A fast density and grid based clustering method for data with arbitrary shapes and noise[J]. IEEE Transactions on Industrial Informatics, 2017, 13(4):1620-1628.
[22] 巩树凤, 张岩峰. EDDPC:一种高效的分布式密度中心聚类算法[J]. 计算机研究与发展, 2016, 53(6):1400-1409. GONG Shufeng, ZHANG Yanfeng. EDDPC: an efficient distributed density peaks clustering algorithm[J]. Journal of Computer Research and Development, 2016, 53(6):1400-1409.
[23] XU X, DING S F, DU M J, et al. DPCG: an efficient density peaks clustering algorithm based on grid[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(5):743-754.
[24] 徐晓, 丁世飞, 孙统风, 等. 基于网格筛选的大规模密度峰值聚类算法[J]. 计算机研究与发展, 2018, 55(11):2419-2429. XU Xiao, DING Shifei, SUN Tongfeng, et al. Large-scale density peaks clustering algorithm based on sparse screening[J]. Journal of Computer Research and Development, 2018, 55(11):2419-2429.
[25] 谢娟英, 高红超, 谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学:信息科学, 2016, 46(2):258-280. XIE Juanying, GAO Hongchao, XIE Weixin. K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Scientia Sinica Informationis, 2016, 46(2):258-280.
[26] XU T F, JIANG J H. A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation[J]. Expert Systems with Applications, 2022, 195:116539.
[27] ZHANG Y F, CHEN S M, YU G. Efficient distributed density peaks for clustering large data sets in MapReduce[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12):3218-3230.
[28] MEHMOOD R, ZHANG G Z, BIE R F, et al. Clustering by fast search and find of density peaks via heat diffusion[J]. Neurocomputing, 2016, 208:210-217.
[29] DING S F, LI C, XU X, et al. A sampling-based density peaks clustering algorithm for large-scale data[J]. Pattern Recognition, 2023, 136:109238.
[30] RASOOL Z, ZHOU R, CHEN L, et al. Index-based solutions for efficient density peak clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(5):2212-2226.
[31] JIANG J, YAN X, YU Z T, et al. A Chinese expert disambiguation method based on semi-supervised graph clustering[J]. International Journal of Machine Learning & Cybernetics, 2015, 6(2):197-204.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向密度峰值聚类的高效相似度度量

Efficient similarity measure for density peaks clustering

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

多维度评价

本文评价

推荐阅读 0

[1]	张佩瑞,杨燕,邢焕来,喻琇瑛. 基于核K-means的增量多视图聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 48-53.
[2]	田枫,刘卓炫,尚福华,沈旭昆,王梅,王浩畅. 基于语境相关图传播的图像标注改善方法[J]. 山东大学学报(工学版), 2016, 46(5): 1-6.
[3]	徐平安,唐雁,石教开,张辉荣. 基于薛定谔方程的K-Means聚类算法[J]. 山东大学学报(工学版), 2016, 46(1): 34-41.
[4]	翟俊海,高原原,王熙照,陈俊芬. 基于划分子集的属性约简算法[J]. 山东大学学报(工学版), 2011, 41(4): 24-28.