山东大学学报 (工学版) ›› 2024, Vol. 54 ›› Issue (3): 12-21.doi: 10.6040/j.issn.1672-3961.0.2023.107
• 机器学习与数据挖掘 • 上一篇
王丽娟1,2,徐晓1*,丁世飞1
WANG Lijuan1,2, XU Xiao1*, DING Shifei1
摘要: 针对密度峰值聚类(density peaks clustering, DPC)计算复杂度高的问题,提出一种面向密度峰值聚类的高效相似度度量(efficient similarity measure, ESM)法,通过仅度量最近邻之间的相似度构建不完全相似度矩阵。最近邻的选择基于一个随机第三方数据对象,无需另外引入参数。基于ESM法构建相似度矩阵,提出一种改进的高效密度峰值聚类(efficient density peaks clustering, EDPC)算法,在保持准确率的同时提高DPC识别聚类中心的效率。理论分析和试验结果表明,ESM法通过减少一定不相似的相似度,可以有效提高DPC及其改进算法基于K最近邻的密度峰值聚类(density peaks clustering based on K-nearest neighbors, DPC-KNN)和模糊加权K最近邻密度峰值聚类(fuzzy weighted K-nearest neighbors density peaks clustering, FKNN-DPC)的计算效率,具有较强的可扩展性。
中图分类号:
[1] CHEN J G, PHILIP S Y. A domain adaptive density clustering algorithm for data with varying density distribution[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6):2310-2321. [2] 贾洪杰, 丁世飞, 史忠植. 求解大规模谱聚类的近似加权核k-means算法[J]. 软件学报, 2015, 26(11):2836-2846. JIA Hongjie, DING Shifei, SHI Zhongzhi. Approximate weighted kernel k-means for large-scale spectral clustering[J]. Journal of Software, 2015, 26(11):2836-2846. [3] YAN X Q, YE Y D, QIU X Y, et al. Synergetic information bottleneck for joint multi-view and ensemble clustering[J]. Information Fusion, 2020, 56:15-27. [4] XIE D Y, GAO Q X, WANG Q Q, et al. Adaptive latent similarity learning for multi-view clustering[J]. Neural Networks, 2020, 121:409-418. [5] QU H, MA T, TONG X Y, et al. Clustering by centroid drift and boundary shrinkage[J]. Pattern Recognition, 2022, 129:108745. [6] BARANWAL M, SALAPAKA S. Clustering and supervisory voltage control in power systems[J]. International Journal of Electrical Power & Energy Systems, 2019, 109:641-651. [7] POTHULA K, SMYRNOVA D, SCHRÖDER G. Clustering cryo-EM images of helical protein polymers for helical reconstructions[J]. Ultramicroscopy, 2019, 203:132-138. [8] SHI Y C, OTTO C, JAIN A. Face clustering: representation and pairwise constraints[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(7):1626-1640. [9] LLOBELL F, VIGNEAU E, QANNARI E. Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics[J]. Food Quality and Preference, 2019, 75:97-104. [10] 史倩玉, 梁吉业, 赵兴旺. 一种不完备混合数据集成聚类算法[J]. 计算机研究与发展, 2016, 53(9):1979-1989. SHI Qianyu, LIANG Jiye, ZHAO Xingwang. A clustering ensemble algorithm for incomplete mixed data[J]. Journal of Computer Research and Development, 2016, 53(9):1979-1989. [11] CHEN Y W, TANG S Y, BOUGUILA N, et al. A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data[J]. Pattern Recognition, 2018, 83:375-387. [12] BRYANT A, CIOS K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6):1109-1121. [13] RODRÍGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. [14] DU M J, DING S F, JIA H J. Study on density peaks clustering based on K-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99:135-145. [15] XIE J Y, GAO H C, XIE W X, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354:19-40. [16] LIU R, WANG H, YU X M. Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450:200-226. [17] DING S F, DU W, XU X, et al. An improved density peaks clustering algorithm based on natural neighbor with a merging strategy[J]. Information Sciences, 2023, 624:252-276. [18] WANG M, MIN F, ZHANG Z H, et al. Active learning through density clustering[J]. Expert Systems with Applications, 2017, 85:305-317. [19] XU J, WANG G Y, LI T R, et al. Fat node leading tree for data stream clustering with density peaks[J]. Knowledge-Based Systems, 2017, 120:99-117. [20] XU J, WANG G Y, DENG W H. DenPEHC:density peak based efficient hierarchical clustering[J]. Infor-mation Sciences, 2016, 373:200-218. [21] WU B, WILAMOWSKI B. A fast density and grid based clustering method for data with arbitrary shapes and noise[J]. IEEE Transactions on Industrial Informatics, 2017, 13(4):1620-1628. [22] 巩树凤, 张岩峰. EDDPC:一种高效的分布式密度中心聚类算法[J]. 计算机研究与发展, 2016, 53(6):1400-1409. GONG Shufeng, ZHANG Yanfeng. EDDPC: an efficient distributed density peaks clustering algorithm[J]. Journal of Computer Research and Development, 2016, 53(6):1400-1409. [23] XU X, DING S F, DU M J, et al. DPCG: an efficient density peaks clustering algorithm based on grid[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(5):743-754. [24] 徐晓, 丁世飞, 孙统风, 等. 基于网格筛选的大规模密度峰值聚类算法[J]. 计算机研究与发展, 2018, 55(11):2419-2429. XU Xiao, DING Shifei, SUN Tongfeng, et al. Large-scale density peaks clustering algorithm based on sparse screening[J]. Journal of Computer Research and Development, 2018, 55(11):2419-2429. [25] 谢娟英, 高红超, 谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学:信息科学, 2016, 46(2):258-280. XIE Juanying, GAO Hongchao, XIE Weixin. K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Scientia Sinica Informationis, 2016, 46(2):258-280. [26] XU T F, JIANG J H. A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation[J]. Expert Systems with Applications, 2022, 195:116539. [27] ZHANG Y F, CHEN S M, YU G. Efficient distributed density peaks for clustering large data sets in MapReduce[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12):3218-3230. [28] MEHMOOD R, ZHANG G Z, BIE R F, et al. Clustering by fast search and find of density peaks via heat diffusion[J]. Neurocomputing, 2016, 208:210-217. [29] DING S F, LI C, XU X, et al. A sampling-based density peaks clustering algorithm for large-scale data[J]. Pattern Recognition, 2023, 136:109238. [30] RASOOL Z, ZHOU R, CHEN L, et al. Index-based solutions for efficient density peak clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(5):2212-2226. [31] JIANG J, YAN X, YU Z T, et al. A Chinese expert disambiguation method based on semi-supervised graph clustering[J]. International Journal of Machine Learning & Cybernetics, 2015, 6(2):197-204. |
[1] | 张佩瑞,杨燕,邢焕来,喻琇瑛. 基于核K-means的增量多视图聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 48-53. |
[2] | 田枫,刘卓炫,尚福华,沈旭昆,王梅,王浩畅. 基于语境相关图传播的图像标注改善方法[J]. 山东大学学报(工学版), 2016, 46(5): 1-6. |
[3] | 徐平安,唐雁,石教开,张辉荣. 基于薛定谔方程的K-Means聚类算法[J]. 山东大学学报(工学版), 2016, 46(1): 34-41. |
[4] | 翟俊海,高原原,王熙照,陈俊芬. 基于划分子集的属性约简算法[J]. 山东大学学报(工学版), 2011, 41(4): 24-28. |
|