您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 48-53.doi: 10.6040/j.issn.1672-3961.0.2017.434

• • 上一篇    下一篇

基于核K-means的增量多视图聚类算法

张佩瑞,杨燕*,邢焕来,喻琇瑛   

  1. 西南交通大学信息科学与技术学院, 四川 成都 611756
  • 收稿日期:2017-05-05 出版日期:2018-06-20 发布日期:2017-05-05
  • 通讯作者: 杨燕(1964— ),女,安徽合肥人,工学博士,博士生导师,主要研究方向为数据挖掘、集成学习. E-mail:yyang@swjtu.edu.cn E-mail:15803118626@139.com
  • 作者简介:张佩瑞(1992—),女,河北邢台人,硕士,主要研究方向为数据库与数据挖掘. E-mail:15803118626@139.com
  • 基金资助:
    国家自然科学基金资助项目(61572407);国家科技支撑计划课题资助项目(2015BAH19F02)

Incremental multi-view clustering algorithm based on kernel K-means

ZHANG Peirui, YANG Yan*, XING Huanlai, YU Xiuying   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
  • Received:2017-05-05 Online:2018-06-20 Published:2017-05-05

摘要: 针对基于核的多视图聚类算法(kernel based multi-view clustering method, MVKKM)在处理大规模数据集时运行时间长的缺点,引入增量聚类模型的概念,将MVKKM算法与增量聚类模型相结合,提出基于核K-means的多视图增量聚类算法(incremental multi-view clustering algorithm based on kernel K-means, IMVCKM)。通过将数据集分块,在每个数据块中使用MVKKM算法聚类,并将每个数据块的聚类中心作为下个数据块的初始聚类中心。将所有块的聚类中心进行整合后再次进行多视图聚类,得到最终的聚类结果。试验结果表明,在3个大规模数据集上,IMVCKM算法相较于MVKKM算法在3个评价指标上具有更好的聚类结果,且运行时间更短。该算法在保证聚类性能的基础上大大降低算法的运行时间。

关键词: 多视图聚类, 多视图核K-means, 增量聚类, 聚类中心, 数据块, 核函数

Abstract: Because of the defect of long running time in the kernel based multi-view clustering algorithm(MVKKM)when dealing with large-scale datasets, the concept of incremental clustering model was introduced. The incremental multi-view clustering algorithm based on kernel K-means(IMVKKM)was proposed by combining MVKKM algorithm and incremental clustering framework. The dataset was divided into chunks and the MVKKM method was used in each data chunk to obtain a set of cluster centers,which was regarded as the initial cluster center of the next chunk. The cluster centers of all the chunks were combined and the final set of cluster result was identified by using MVKKM. The experimental results showed that IMVKKM algorithm had better clustering results and shorter running time than MVKKM algorithm on three large-scale datasets. The proposed approach could reduce the running time while keeping the clustering performance.

Key words: kernel function, multi-view kernel K-means, incremental clustering, cluster center, dataset chunk, multi-view clusterting

中图分类号: 

  • TP391
[1] BICKEL S, SCHEFFER T. Multi-view clustering[C] //Proceedings of the 4th IEEE International Conference on Data Mining. New Jersey, USA: IEEE, 2004, 4: 19-26.
[2] KUMAR A, DAUME H. A co-training approach for multi-view spectral clustering[C] //Proceedings of the 28th International Conference on Machine Learning. Washington, USA: ICML, 2011: 393-400.
[3] TZOTZIS G, LIKAS A. Kernel-based weighted multi-view clustering[C] //Proceedings of the 12th IEEE International Conference on Data Mining. New Jersey, USA: IEEE, 2012: 675-684.
[4] LIU Jialu, WANG Chi, GAO Jiawei, et al. Multi-view clustering via joint nonnegative matrix factorization[C] //Proceedings of the 2013 SIAM International Conference on Data Mining. Texas, USA: ResearchGate, 2013: 252-260.
[5] WANG Dong, YIN Qiyue, HE Ran, et al. Multi-view clustering via structured low-rank representation[C] //Proceedings of the 24th ACM International Conference on Information and Knowledge Management. Melbourne, Australia: ACM, 2015: 1911-1914.
[6] ZHAO Yang, DOU Yong, LIU Xinwang, et al. A novel multi-view clustering method via low-rank and matrix-induced regularization[J]. Neurocomputing, 2016, 216: 342-350.
[7] 邓强, 杨燕, 王浩. 一种改进的多视图聚类集成算法[J]. 计算机科学, 2017, 44(1): 65-70. DENG Qiang, YANG Yan, WANG Hao. Improved multi-view clustering ensemble algorithm[J]. Computer Science, 2017, 44(1): 65-70.
[8] CAN F, DROCHAK N D I. Incremental clustering for dynamic document databases[C] //IEEE International Symposium on Applied Computing. New Jersey, USA: IEEE, 1990: 61-67.
[9] NG R T, HAN J. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(5): 1003-1016.
[10] HORE P, HALL L O, Goldgof D B, et al. Online fuzzy C means[C] //Fuzzy Information Processing Society. New Jersey, USA: IEEE, 2008: 1-5.
[11] HORE P, HALL L O, GOLDGOF D B. Single pass fuzzy C means[C] //IEEE International Conference on Fuzzy Systems. London, UK: IEEE, 2007: 1-7.
[12] 李滔, 王士同. 适合大规模数据集的增量式模糊聚类算法[J]. 智能系统学报, 2016, 11(2):188-199. LI Tao, WANG Shitong. Incremental fuzzy(c+p)—means clustering for large data[J]. CAAI Transactions on Intelligent Systems, 2016, 11(2): 188-199.
[13] 张佩瑞. 基于多核学习的多视图增量聚类模型研究[D]. 成都: 西南交通大学, 2017. ZHANG Peirui. Research on multi-view incremental clustering based on multiple kernel learning[D]. Chengdu: Southwest Jiaotong University, 2017.
[14] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012.
[15] 袁瑛. 基于正则化的多核学习方法及应用[D]. 广州: 华南理工大学, 2016. YUAN Ying. Multiple kernel learning with regularization and its application[D]. Guangzhou: South China University of Technology, 2016.
[16] TZORTZIS G F, LIKAS A C. The global kernel K-means algorithm for clustering in feature space[J]. IEEE Transactions on Neural Networks, 2009, 20(7): 1181-1194.
[17] 邓强. 多视图子空间聚类集成方法研究及分布式实现[D]. 成都: 西南交通大学, 2016. DENG Qiang. Research on multi-view subspace clustering ensemble and its distributed implementation[D]. Chengdu: Southwest Jiaotong University, 2016.
[18] 杨燕, 靳蕃, KAMEL M. 聚类有效性评价综述[J]. 计算机应用研究, 2008(6):1630-1632, 1638. YANG Yan, JIN Fan, KAMEL M. Survey of clustering validity evaluation[J]. Application Research of Computers, 2008(6): 1630-1632, 1638.
[19] 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. LIU Xiaoyong. A semi-supervised method based on tree kernel for relationship extraction[J]. Journal of Shandong University(Engineering Science), 2015, 45(2):22-26.
[20] GU Quanquan, ZHOU Jie. Learning the shared subspace for multi-task clustering and transductive transfer classification[C] //Proceedings of the 9th IEEE International Conference on Data Mining. Florida, USA: IEEE, 2009: 159-168.
[21] DENG Zhaohong, CHOI K S, CHUNG Fulai, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767-781.
[1] 周凯,元昌安,覃晓,郑彦,冯文铎. 基于核贝叶斯压缩感知的人脸识别[J]. 山东大学学报(工学版), 2016, 46(3): 74-78.
[2] 翟俊海,张素芳,胡文祥,王熙照. 核心集径向基函数极限学习机[J]. 山东大学学报(工学版), 2016, 46(2): 1-5.
[3] 徐平安,唐雁,石教开,张辉荣. 基于薛定谔方程的K-Means聚类算法[J]. 山东大学学报(工学版), 2016, 46(1): 34-41.
[4] 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15.
[5] 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26.
[6] 浩庆波, 牟少敏, 尹传环, 昌腾腾, 崔文斌. 一种基于聚类的快速局部支持向量机算法[J]. 山东大学学报(工学版), 2015, 45(1): 13-18.
[7] 郭慧玲,王士同*,闫晓波. 基于广义旋转不变性核函数的人脸识别[J]. 山东大学学报(工学版), 2012, 42(5): 71-79.
[8] 张思懿1,2,王士同1*. 核化空间深度间距的特征提取方法[J]. 山东大学学报(工学版), 2012, 42(3): 45-51.
[9] 王胜春,韩捷,李剑峰,李志农 . 基于模糊域和支持向量机的故障诊断方法[J]. 山东大学学报(工学版), 2006, 36(6): 116-120 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!