山东大学学报 (工学版) ›› 2022, Vol. 52 ›› Issue (2): 118-127.doi: 10.6040/j.issn.1672-3961.0.2021.310
• • 上一篇
卢建云1,2,张蔚3,李林2
LU Jianyun1,2, ZHANG Wei3, LI Lin2
摘要: 为提高逆k最近邻在度量局部密度时的区分度,提出动态逆k最近邻概念。利用泊松概率密度函数拟合逆k最近邻分布,并计算累积动态逆k最近邻局部密度;基于动态局部密度对数据对象进行排序,利用逆k最近邻域扩展算法生成聚类结构;依据动态局部密度和欧式距离设计聚类决策图,根据决策图找出聚类结构中的类间间断点,利用间断点将聚类结构直接划分成独立的类簇。将本研究提出的聚类结构划分聚类(cluster structure partition clustering,CSPC)算法与DBSCAN、DPC和RNN-DBSCAN算法在人工和真实数据集上进行试验对比,CSCP在人工和真实数据集上的评价指标F1平均分别提高8.8%和8.2%,评价指标标准互信息平均分别提高11.6%和7.3%。试验结果表明CSPC算法取得了更好的聚类结果。
中图分类号:
[1] 章永来, 周耀鉴.聚类算法综述[J].计算机应用, 2019, 39(7):1869-1882. ZHANG Yonglai, ZHOU Yaojian. Review ofclustering algorithms[J].Journal of Computer Applications, 2019, 39(7):1869-1882. [2] UTTARKABAT S, SUNKARA N D, PATRA B K. RSOD: efficient technique for outlier detection using reverse nearest neighbors statistics[C] //Proceedings of the 4th International Conference on Computational Intel-ligence and Networks(CINE).Kolkata, India:IEEE, 2020:1-6. [3] KURSUN O. Spectral clustering with reverse soft k-nearest neighbor density estimation[C] // Proceedings of the 2010 International Joint Conference on Neural Networks(IJCNN). Barcelona, Spain: IEEE, 2010: 1-8. [4] BRYANT A, COIL K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates[J].IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6):1109-1121. [5] VADAPALLI S, VALLURI S R, KARLAPALEM K. A simple yet effective data clustering algorithm[C] //Proceedings of the Sixth International Conference on Data Mining(ICDM’06). Hong Kong, China: IEEE, 2006:1108-1112. [6] WU Q, ZHANG Q, SUN R, et al. Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors[J]. Information Technology and Control, 2020, 49(3):395-411. [7] LIU Y, LIU D, YU F, et al. A double-density clustering method based on "nearest to first in" strategy[J]. Symmetry-Basel, 2020, 12(5):747-764. [8] LU J, ZHU Q. An Effective algorithm based on density clustering framework[J]. IEEE Access, 2017:4991-5000. [9] MUTHUKRISHNAN S. Influence sets based on reverse nearest neighbor queries[C] //Proceedings of the International Conference on Management of Data.Dallas, United States: ACM, 2000:201-212. [10] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C] // Proceedings of the 2nd International Conference on Knowledge Discovery(KDD).Portland, United States:AAAI press, 1996:226-231. [11] SCITO VS KI R, SABO K. DBSCAN-like clustering method for various data densities[J].Pattern Analysis and Applications, 2020, 23(2):541-554. [12] CHEN Y, ZHOU L, BOUGUILA N, et al. BLOCK-DBSCAN: fast clustering for large scale data[J].Pattern Recognition, 2021, 51(6):3939-3953. [13] ANKERST M, BREUNIG M, KRIEGELH P, et al. OPTICS: ordering points to identify the clustering structure[C] //Proceedings of the International Conference on Management of Data. Nanjing, China: ACM, 1999:49-60. [14] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. [15] 刘颖莹,刘培玉,王智昊,等.一种基于密度峰值发现的文本聚类算法[J].山东大学学报(理学版), 2016, 51(1):65-70. LIU Yingying, LIU Peiyu, WANG Zhihao, et al. A text clustering algorithm based on find of density peaks [J].Journal of Shandong University(Natural Science), 2016, 51(1):65-70. [16] DIAO Q, DAI Y, AN Q, et al. Clustering by detecting density peaks and assigning points by similarity-first search based on weighted k-nearest neighbors graph[J]. Complexity, 2020:1-17. [17] ABBAS M, EL-ZOGHABI A, SHOUKRY A. DenMune: density peak based clustering using mutual nearest neighbors[J]. Pattern Recognition, 2020:1-18. [18] 杨天鹏,徐鲲鹏,陈黎飞.非均匀数据的变异系数聚类算法[J].山东大学学报(工学版), 2018,48(3):140-146. YANG Tianpeng, XU Kunpeng, CHEN Lifei. Coef-ficient of variation clustering algorithm for non-uniform data[J]. Journal of Shandong University(Engineering Science), 2018, 48(3):140-146. [19] 卢建云,朱庆生,吴全旺.一种启发式确定聚类数方法[J].小型微型计算机系统, 2018, 39(7):1381-1385. LU Jinayun, ZHU Qingsheng, WU Quanwang. Heuristic method of determining the number of clusters[J]. Journal of Chinese Computer Systems, 2018, 39(7):1381-1385. [20] 纪霞,姚晟,赵鹏.相对邻域与剪枝策略优化的密度峰值聚类算法[J].自动化学报,2020,46(3):562-575. JI Xia, YAO Cheng, ZHAO Peng. Relative neigh-borhood and pruning strategy optimized density peaks clustering algorithm [J].Acta Automatica Sinica, 2020, 46(3):562-575. [21] BAKOUCH H, KACHOUR M, NADARAJAH S. An extended Poisson distribution[J]. Communications in Statistics-Thoeryand Methods, 2013, 45(22):6746-6764. [22] University of Eastern Finland. Clustering datasets: Shape sets[DB/OL].[2022-02]. https://cs.joensuu.fi/sipu/datasets/. [23] LICHMAN M. UCI machine learning repository 2013. [DB/OL]. [2022-02]. http://archive.ics.uci.edu/ml. [24] LIU X, CHENG H M, ZHANG Z Y, Evaluation of community detection methods[J].IEEE Transactions on Knowledge and Data Engineering, 2020, 32(9):1736-1746. |
[1] | 蒋桐雨,陈帆,和红杰. 基于非对称U型金字塔重建的轻量级人脸超分辨率网络[J]. 山东大学学报 (工学版), 2022, 52(1): 1-8, 18. |
[2] | 胡军,杨冬梅,刘立,钟福金. 融合节点状态信息的跨社交网络用户对齐[J]. 山东大学学报 (工学版), 2021, 51(6): 49-58. |
[3] | 梁晔,马楠,刘宏哲. 图像依赖的显著图融合方法[J]. 山东大学学报 (工学版), 2021, 51(4): 1-7. |
[4] | 宗欣露,杜佳圆. 基于多目标驱动人工蜂群算法的疏散仿真模型[J]. 山东大学学报 (工学版), 2021, 51(3): 1-6. |
[5] | 杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21. |
[6] | 傅桂霞,邹国锋,毛帅,潘金凤,尹丽菊. 融合Gabor特征与卷积特征的小样本行人重识别[J]. 山东大学学报 (工学版), 2021, 51(3): 22-29. |
[7] | 陶亮,刘宝宁,梁玮. 基于CNN-LSTM 混合模型的心律失常自动检测[J]. 山东大学学报 (工学版), 2021, 51(3): 30-36. |
[8] | 张俊三,程俏俏,万瑶,朱杰,张世栋. MIRGAN: 一种基于GAN的医学影像报告生成模型[J]. 山东大学学报 (工学版), 2021, 51(2): 9-18. |
[9] | 周风余,顾潘龙,万方,尹磊,贺家凯. 多运动视觉里程计的方法与技术[J]. 山东大学学报 (工学版), 2021, 51(1): 1-10. |
[10] | 王梅,薛成龙,张强. 基于秩空间差异的多核组合方法[J]. 山东大学学报 (工学版), 2021, 51(1): 108-113. |
[11] | 谢晓兰,王琦. 一种基于多目标的容器云任务调度算法[J]. 山东大学学报 (工学版), 2020, 50(4): 14-21. |
[12] | 蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13. |
[13] | 成科扬,孙爽,詹永照. 基于背景复杂度自适应距离阈值的修正SuBSENSE算法[J]. 山东大学学报 (工学版), 2020, 50(3): 38-44. |
[14] | 田枫,李欣,刘芳,李闯,孙小强,杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37, 44. |
[15] | 马金平. 基于UART串口的多机通讯[J]. 山东大学学报 (工学版), 2020, 50(3): 24-30. |
|