您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2024, Vol. 54 ›› Issue (6): 1-7.doi: 10.6040/j.issn.1672-3961.0.2023.157

• 机器学习与数据挖掘 •    

DMKK-means——一种深度多核K-means聚类算法

王梅1,2,宋凯文1,刘勇3,4*,王志宝1,万达1   

  1. 1.东北石油大学计算机与信息技术学院, 黑龙江 大庆 163318;2.黑龙江省石油大数据与智能分析重点实验室, 黑龙江 大庆 163318;3.中国人民大学高瓴人工智能学院, 北京 100049;4.大数据管理与分析方法研究北京市重点实验室(中国人民大学信息学院), 北京 100049
  • 发布日期:2024-12-26
  • 作者简介:王梅(1976— ),女,河北保定人,教授,硕士生导师,博士,主要研究方向为机器学习、模型选择、核方法. E-mail: wangmei@nepu.edu.cn. *通信作者简介:刘勇(1986— ),男,湖南益阳人,副研究员,博士生导师,博士,主要研究方向为大规模机器学习及统计机器学习理论. E-mail:liuyonggsai@ruc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(51774090,62076234);黑龙江省博士后科研启动金资助项目(LBH-Q20080);黑龙江省自然科学基金资助项目(LH2020F003);黑龙江省高校基本科研业务费资助项目(KYCXTD201903,YYYZX202105)

DMKK-means: a deep multiple kernel K-means clustering algorithm

WANG Mei1,2, SONG Kaiwen1, LIU Yong3,4*, WANG Zhibao1, WAN Da1   

  1. 1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, Heilongjiang, China;
    2. Heilongjiang Key Laboratory of Petroleum Big Data and Intelligent Analysis, Daqing 163318, Heilongjiang, China;
    3. Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100049, China;
    4. Beijing Key Laboratory of Big Data Management and Analysis Method(School of Information, Renmin University of China), Beijing 100049, China
  • Published:2024-12-26

摘要: 针对传统K-means的聚类效果容易受到样本分布影响,且核函数表示能力不强导致对于复杂问题的聚类效果表现不佳的问题,利用深度核的强表示性并通过多核集成方式,提出一种具有强表示能力且分布鲁棒的深度多核K-means(deep multiple kernel K-means, DMKK-means)聚类算法。构建具有强表示能力的深度多核网络架构,在新的特征空间进行K-means聚类;基于Kullback-Leibler(KL)散度的聚类损失函数衡量该算法与2种基准聚类方法的差异;将该聚类算法建模成高效的端到端学习问题,利用随机梯度下降算法更新优化深度多核网络的权重参数。在多个标准数据集上进行试验,结果表明,相比于K-means、径向基函数核K-means(radial basis function kernel K-means, RBFKKM)及其他多核K-means聚类算法,该算法在聚类精度、归一化互信息和调整兰德系数指标上均有明显提升,验证该算法的可行性与有效性。

关键词: K-means, 核聚类, 深度多核学习, 数据挖掘, 梯度下降

中图分类号: 

  • TP391
[1] 章永来, 周耀鉴. 聚类算法综述[J]. 计算机应用, 2019, 39(7): 1869-1882. ZHANG Yonglai, ZHOU Yaojian. Review of clustering algorithms[J]. Journal of Computer Applications, 2019, 39(7): 1869-1882.
[2] MADHULATHA T S. An overview on clustering methods[J]. IOSR Journal of Engineering, 2012, 2(4): 719-725.
[3] XU R, WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
[4] 徐金东, 赵甜雨, 冯国政, 等. 基于上下文模糊C均值聚类的图像分割算法[J]. 电子与信息学报, 2021, 43(7): 2079-2086. XU Jindong, ZHAO Tianyu, FENG Guozheng, et al. Image segmentation algorithm based on context fuzzy C-means clustering[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2079-2086.
[5] 姜东明, 杨火根. 融合图卷积网络模型的无监督社区检测算法[J]. 计算机工程与应用, 2020, 56(20): 59-66. JIANG Dongming, YANG Huogen. Unsupervised community detection algorithm integrating graph convolutional network model[J]. Computer Engineering and Applications, 2020, 56(20): 59-66.
[6] 刘大莲, 田英杰. 可拓数据挖掘在学生成绩分析中的应用研究[J]. 智能系统学报, 2022, 17(4): 707-713. LIU Dalian, TIAN Yingjie. Application of extension data mining in student achievement analysis[J]. CAAI Transactions on Intelligent Systems, 2022, 17(4): 707-713.
[7] SCHÖLKOPF B, SMOLA A, MÜLLER K R. Nonlinear component analysis as a kernel eigenvalue problem[J]. Neural Computation, 1998, 10(5): 1299-1319.
[8] ZHAO B, KWOK J T, ZHANG C. Multiple kernel clustering[C] //Proceedings of the 2009 SIAM International Conference on Data Mining. Sparks, USA: Society for Industrial and Applied Mathematics, 2009: 638-649.
[9] LU Y, WANG L, LU J, et al. Multiple kernel clustering based on centered kernel alignment[J]. Pattern Recognition, 2014, 47(11): 3656-3664.
[10] GÖNEN M, MARGOLIN A A. Localized data fusion for kernel K-means clustering with application to cancer biology[J]. Advances in Neural Information Processing Systems, 2014, 27: 1305-1313.
[11] 俞磊, 朱铮, 蒋超, 等. 自适应局部核的最优邻域多核聚类[J]. 控制工程, 2022, 29(1): 182-192. YU Lei, ZHU Zheng, JIANG Chao, et al. Optimal neighborhood multiple kernel clustering based on adaptive local kernel[J]. Control Engineering of China, 2022, 29(1): 182-192.
[12] JIA L, LI M, ZHANG P, et al. SAR image change detection based on multiple kernel K-means clustering with local-neighborhood information[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(6): 856-860.
[13] 欧琦媛, 祝恩. 基于压缩子空间对齐的多核聚类算法[J]. 计算机工程与科学, 2021, 43(10): 1730-1735. OU Qiyuan, ZHU En. Multiple-kernel clustering based on compressed subspace alignment[J]. Computer Engineering & Science, 2021, 43(10): 1730-1735.
[14] LIU X. Simple MKKM: simple multiple kernel K-means[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023:5174-5186.
[15] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[16] ZHUANG J, TSANG I W, HOI S C H. Two-layer multiple kernel learning[C] //Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Florida, USA: JMLR, 2011: 909-917.
[17] 王梅, 宋晓晖, 刘勇, 等. 神经正切核K-means聚类[J]. 计算机应用, 2022, 42(11): 3330-3336. WANG Mei, SONG Xiaohui, LIU Yong, et al. Neural tangent kernel K-means clustering[J]. Journal of Computer Applications, 2022, 42(11): 3330-3336.
[18] BEN-HUR A, HORN D, SIEGELMANN H T, et al. Support vector clustering[J]. Journal of Machine Learning Research, 2001, 2(12): 125-137.
[19] ALZATE C, SUYKENS J A K. Sparse kernel spectral clustering models for large-scale data analysis[J]. Neurocomputing, 2011, 74(9): 1382-1390.
[20] 张莉, 周伟达, 焦李成. 核聚类算法[J]. 计算机学报, 2002(6): 587-590. ZHANG Li, ZHOU Weida, JIAO Licheng. Kernel clustering algorithm[J]. Chinese Journal of Computers, 2002(6): 587-590.
[21] TZORTZIS G, LIKAS A. The global kernel K-means clustering algorithm[C] //Proceedings of 2008 IEEE International Joint Conference on Neural Networks. Hong Kong, China: IEEE, 2008: 1977-1984.
[22] LIU Y. Refined learning bounds for kernel and approximate K-means[J]. Advances in Neural Infor-mation Processing Systems, 2021, 34: 6142-6154.
[23] HUANG H C, CHUANG Y Y, CHEN C S. Multiple kernel fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2011, 20(1): 120-134.
[24] YU S, TRANCHEVENT L, LIU X, et al. Optimized data fusion for kernel K-means clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(5): 1031-1039.
[25] XIA R, PAN Y, DU L, et al. Robust multi-view spectral clustering via low-rank and sparse decomposition[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Quebec, Canada: AAAI, 2014: 2149-2155.
[26] GUO D, ZHANG J, LIU X, et al. Multiple kernel learning based multi-view spectral clustering[C] //Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014: 3774-3779.
[27] CHO Y, SAUL L. Kernel methods for deep learning[J]. Advances in Neural Information Processing Systems, 2009, 22: 342-350.
[28] STROBL E V, VISWESWARAN S. Deep multiple kernel learning[C] // Proceedings of the 12th International Conference on Machine Learning and Applications. Miami, USA: IEEE, 2013: 414-417.
[29] REBAI I, BENAYED Y, MAHDI W. Deep multilayer multiple kernel learning[J]. Neural Computing and Applications, 2016, 27(8): 2305-2314.
[30] LIU Y, LIAO S, HOU Y. Learning kernels with upper bounds of leave-one-out error[C] //Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Birmingham, UK: ACM, 2011: 2205-2208.
[31] JIU M, SAHBI H. Nonlinear deep kernel learning for image annotation[J]. IEEE Transactions on Image Processing, 2017, 26(4): 1820-1832.
[32] CHEN X, PENG X, DUAN R, et al. Deep kernel learning method for SAR image target recognition[J]. Review of Scientific Instruments, 2017, 88(10): 104706.
[33] ERHAN D, MANZAGOL P A, BENGIO Y, et al. The difficulty of training deep architectures and the effect of unsupervised pre-training[C] //Proceedings of the 2009 Artificial Intelligence and Statistics.Florida, USA: JMLR, 2009: 153-160.
[34] WIEHMAN S, KROON S, DE VILLIERS H. Unsupervised pre-training for fully convolutional neural networks[C] //Proceedings of the 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference. Stellenbosch, South Africa: IEEE, 2016: 1-6.
[35] SUN X, YANG Z, ZHANG C, et al. Conditional gaussian distribution learning for open set recognition[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle, USA: IEEE, 2020: 13480-13489.
[36] CHOUDHURI N, GHOSAL S, ROY A. Nonparametric binary regression using a Gaussian process prior[J]. Statistical Methodology, 2007, 4(2): 227-243.
[37] SONG H, THIAGARAJAN J J, SATTIGERI P, et al. Optimizing kernel machines using deep learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(11): 5528-5540.
[38] MAMAT N, OTHMAN M F, ABDULGHAFOR R, et al. Enhancing image annotation technique of fruit classification using a deep learning approach[J]. Sustainability, 2023, 15(2): 1-19.
[39] SAMBATURU B, GUPTA A, JAWAHAR C V, et al.ScribbleNet: efficient interactive annotation of urban city scenes for semantic segmentation[J]. Pattern Recog-nition, 2023, 133: 109011.
[1] 李兆彬,叶军,周浩岩,卢岚,谢立. 变异萤火虫优化的粗糙K-均值聚类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 74-82.
[2] 侯延琛,赵金东. 任意形状聚类的SPK-means算法[J]. 山东大学学报 (工学版), 2023, 53(2): 87-92.
[3] 聂秀山,马玉玲,乔慧妍,郭杰,崔超然,于志云,刘兴波,尹义龙. 任务粒度视角下的学生成绩预测研究综述[J]. 山东大学学报 (工学版), 2022, 52(2): 1-14.
[4] 张妮,韩萌,王乐,李小娟,程浩东. 基于索引列表的增量高效用模式挖掘算法[J]. 山东大学学报 (工学版), 2022, 52(2): 107-117.
[5] 金保明,卢光毅,王伟,杜伦阅. 基于弹性梯度下降算法的BP神经网络降雨径流预报模型[J]. 山东大学学报 (工学版), 2020, 50(3): 117-124.
[6] 李英达,谢宗霞. 基于核相似性删减策略的支持向量回归算法[J]. 山东大学学报 (工学版), 2019, 49(3): 8-14.
[7] 董新宇,陈瀚阅,李家国,孟庆岩,邢世和,张黎明. 基于多方法融合的非监督彩色图像分割[J]. 山东大学学报 (工学版), 2019, 49(2): 96-101.
[8] 杨思, 李思童, 张进东, 白羽. 高速光通信激光器带宽模型改进与并行计算优化[J]. 山东大学学报 (工学版), 2019, 49(1): 17-22.
[9] 杨天鹏,徐鲲鹏,陈黎飞. 非均匀数据的变异系数聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 140-145.
[10] 张佩瑞,杨燕,邢焕来,喻琇瑛. 基于核K-means的增量多视图聚类算法[J]. 山东大学学报(工学版), 2018, 48(3): 48-53.
[11] 肖苗苗,魏本征,尹义龙. 基于BFOA和K-means的复合入侵检测算法[J]. 山东大学学报(工学版), 2018, 48(3): 115-119.
[12] 庞人铭,王波,叶昊,张海峰,李明亮. 基于PCA相似度和谱聚类相结合的高炉历史数据聚类[J]. 山东大学学报(工学版), 2017, 47(5): 143-149.
[13] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23.
[14] 朱全银1,严云洋1,周培1,谷天峰2. 一种线性插补与自适应滑动窗口价格预测模型[J]. 山东大学学报(工学版), 2012, 42(5): 53-58.
[15] 王爱国,李廉*,杨静,陈桂林. 一种基于Bayesian网络的网页推荐算法[J]. 山东大学学报(工学版), 2011, 41(4): 137-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!