您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2022, Vol. 52 ›› Issue (2): 118-127.doi: 10.6040/j.issn.1672-3961.0.2021.310

• • 上一篇    

一种基于动态局部密度和聚类结构的聚类算法

卢建云1,2,张蔚3,李林2   

  1. 1.重庆电子工程职业学院人工智能与大数据学院, 重庆 401331;2.电子科技大学计算机科学与工程学院, 四川 成都 611731;3.中国电子科技集团公司第二十九研究所, 四川 成都 610036
  • 发布日期:2022-04-20
  • 作者简介:卢建云(1982— ),男,河北沽源人,副教授,博士,主要研究方向为数据挖掘、机器学习、图神经网络. E-mail:lujianyun@uestc.edu.cn
  • 基金资助:
    四川省科学技术项目(2020YFH0037);重庆市技术创新与应用发展面上项目(cstc2019jscx-msxmX0035);重庆市科技局重大项目(cstc2018jszx-cyztzxX0034);重庆市教委科学技术项目(KJQN202103109)

A clustering algorithm based on dynamic local density and cluster structure

LU Jianyun1,2, ZHANG Wei3, LI Lin2   

  1. 1. School of Artificial Intelligence and Big Data, Chongqing College of Electronic Engineering, Chongqing 401331, China;
    2. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, China;
    3. The 29th Research Institute, China Electronics Technology Group Corporation, Chengdu 610036, Sichuan, China
  • Published:2022-04-20

摘要: 为提高逆k最近邻在度量局部密度时的区分度,提出动态逆k最近邻概念利用泊松概率密度函数拟合逆k最近邻分布,并计算累积动态逆k最近邻局部密度;基于动态局部密度对数据对象进行排序,利用逆k最近邻域扩展算法生成聚类结构;依据动态局部密度和欧式距离设计聚类决策图,根据决策图找出聚类结构中的类间间断点,利用间断点将聚类结构直接划分成独立的类簇将本研究提出的聚类结构划分聚类(cluster structure partition clustering,CSPC)算法与DBSCAN、DPC和RNN-DBSCAN算法在人工和真实数据集上进行试验对比,CSCP在人工和真实数据集上的评价指标F1平均分别提高8.8%和8.2%,评价指标标准互信息平均分别提高11.6%和7.3%。试验结果表明CSPC算法取得了更好的聚类结果。

关键词: 动态局部密度, 泊松概率密度函数, k最近邻, 聚类结构, 决策图

中图分类号: 

  • TP391
[1] 章永来, 周耀鉴.聚类算法综述[J].计算机应用, 2019, 39(7):1869-1882. ZHANG Yonglai, ZHOU Yaojian. Review ofclustering algorithms[J].Journal of Computer Applications, 2019, 39(7):1869-1882.
[2] UTTARKABAT S, SUNKARA N D, PATRA B K. RSOD: efficient technique for outlier detection using reverse nearest neighbors statistics[C] //Proceedings of the 4th International Conference on Computational Intel-ligence and Networks(CINE).Kolkata, India:IEEE, 2020:1-6.
[3] KURSUN O. Spectral clustering with reverse soft k-nearest neighbor density estimation[C] // Proceedings of the 2010 International Joint Conference on Neural Networks(IJCNN). Barcelona, Spain: IEEE, 2010: 1-8.
[4] BRYANT A, COIL K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates[J].IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6):1109-1121.
[5] VADAPALLI S, VALLURI S R, KARLAPALEM K. A simple yet effective data clustering algorithm[C] //Proceedings of the Sixth International Conference on Data Mining(ICDM’06). Hong Kong, China: IEEE, 2006:1108-1112.
[6] WU Q, ZHANG Q, SUN R, et al. Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors[J]. Information Technology and Control, 2020, 49(3):395-411.
[7] LIU Y, LIU D, YU F, et al. A double-density clustering method based on "nearest to first in" strategy[J]. Symmetry-Basel, 2020, 12(5):747-764.
[8] LU J, ZHU Q. An Effective algorithm based on density clustering framework[J]. IEEE Access, 2017:4991-5000.
[9] MUTHUKRISHNAN S. Influence sets based on reverse nearest neighbor queries[C] //Proceedings of the International Conference on Management of Data.Dallas, United States: ACM, 2000:201-212.
[10] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C] // Proceedings of the 2nd International Conference on Knowledge Discovery(KDD).Portland, United States:AAAI press, 1996:226-231.
[11] SCITO VS KI R, SABO K. DBSCAN-like clustering method for various data densities[J].Pattern Analysis and Applications, 2020, 23(2):541-554.
[12] CHEN Y, ZHOU L, BOUGUILA N, et al. BLOCK-DBSCAN: fast clustering for large scale data[J].Pattern Recognition, 2021, 51(6):3939-3953.
[13] ANKERST M, BREUNIG M, KRIEGELH P, et al. OPTICS: ordering points to identify the clustering structure[C] //Proceedings of the International Conference on Management of Data. Nanjing, China: ACM, 1999:49-60.
[14] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[15] 刘颖莹,刘培玉,王智昊,等.一种基于密度峰值发现的文本聚类算法[J].山东大学学报(理学版), 2016, 51(1):65-70. LIU Yingying, LIU Peiyu, WANG Zhihao, et al. A text clustering algorithm based on find of density peaks [J].Journal of Shandong University(Natural Science), 2016, 51(1):65-70.
[16] DIAO Q, DAI Y, AN Q, et al. Clustering by detecting density peaks and assigning points by similarity-first search based on weighted k-nearest neighbors graph[J]. Complexity, 2020:1-17.
[17] ABBAS M, EL-ZOGHABI A, SHOUKRY A. DenMune: density peak based clustering using mutual nearest neighbors[J]. Pattern Recognition, 2020:1-18.
[18] 杨天鹏,徐鲲鹏,陈黎飞.非均匀数据的变异系数聚类算法[J].山东大学学报(工学版), 2018,48(3):140-146. YANG Tianpeng, XU Kunpeng, CHEN Lifei. Coef-ficient of variation clustering algorithm for non-uniform data[J]. Journal of Shandong University(Engineering Science), 2018, 48(3):140-146.
[19] 卢建云,朱庆生,吴全旺.一种启发式确定聚类数方法[J].小型微型计算机系统, 2018, 39(7):1381-1385. LU Jinayun, ZHU Qingsheng, WU Quanwang. Heuristic method of determining the number of clusters[J]. Journal of Chinese Computer Systems, 2018, 39(7):1381-1385.
[20] 纪霞,姚晟,赵鹏.相对邻域与剪枝策略优化的密度峰值聚类算法[J].自动化学报,2020,46(3):562-575. JI Xia, YAO Cheng, ZHAO Peng. Relative neigh-borhood and pruning strategy optimized density peaks clustering algorithm [J].Acta Automatica Sinica, 2020, 46(3):562-575.
[21] BAKOUCH H, KACHOUR M, NADARAJAH S. An extended Poisson distribution[J]. Communications in Statistics-Thoeryand Methods, 2013, 45(22):6746-6764.
[22] University of Eastern Finland. Clustering datasets: Shape sets[DB/OL].[2022-02]. https://cs.joensuu.fi/sipu/datasets/.
[23] LICHMAN M. UCI machine learning repository 2013. [DB/OL]. [2022-02]. http://archive.ics.uci.edu/ml.
[24] LIU X, CHENG H M, ZHANG Z Y, Evaluation of community detection methods[J].IEEE Transactions on Knowledge and Data Engineering, 2020, 32(9):1736-1746.
[1] 蒋桐雨,陈帆,和红杰. 基于非对称U型金字塔重建的轻量级人脸超分辨率网络[J]. 山东大学学报 (工学版), 2022, 52(1): 1-8, 18.
[2] 胡军,杨冬梅,刘立,钟福金. 融合节点状态信息的跨社交网络用户对齐[J]. 山东大学学报 (工学版), 2021, 51(6): 49-58.
[3] 梁晔,马楠,刘宏哲. 图像依赖的显著图融合方法[J]. 山东大学学报 (工学版), 2021, 51(4): 1-7.
[4] 宗欣露,杜佳圆. 基于多目标驱动人工蜂群算法的疏散仿真模型[J]. 山东大学学报 (工学版), 2021, 51(3): 1-6.
[5] 杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21.
[6] 傅桂霞,邹国锋,毛帅,潘金凤,尹丽菊. 融合Gabor特征与卷积特征的小样本行人重识别[J]. 山东大学学报 (工学版), 2021, 51(3): 22-29.
[7] 陶亮,刘宝宁,梁玮. 基于CNN-LSTM 混合模型的心律失常自动检测[J]. 山东大学学报 (工学版), 2021, 51(3): 30-36.
[8] 张俊三,程俏俏,万瑶,朱杰,张世栋. MIRGAN: 一种基于GAN的医学影像报告生成模型[J]. 山东大学学报 (工学版), 2021, 51(2): 9-18.
[9] 周风余,顾潘龙,万方,尹磊,贺家凯. 多运动视觉里程计的方法与技术[J]. 山东大学学报 (工学版), 2021, 51(1): 1-10.
[10] 王梅,薛成龙,张强. 基于秩空间差异的多核组合方法[J]. 山东大学学报 (工学版), 2021, 51(1): 108-113.
[11] 谢晓兰,王琦. 一种基于多目标的容器云任务调度算法[J]. 山东大学学报 (工学版), 2020, 50(4): 14-21.
[12] 蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13.
[13] 成科扬,孙爽,詹永照. 基于背景复杂度自适应距离阈值的修正SuBSENSE算法[J]. 山东大学学报 (工学版), 2020, 50(3): 38-44.
[14] 田枫,李欣,刘芳,李闯,孙小强,杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37, 44.
[15] 马金平. 基于UART串口的多机通讯[J]. 山东大学学报 (工学版), 2020, 50(3): 24-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!