山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 48-53.doi: 10.6040/j.issn.1672-3961.0.2017.434
张佩瑞,杨燕*,邢焕来,喻琇瑛
ZHANG Peirui, YANG Yan*, XING Huanlai, YU Xiuying
摘要: 针对基于核的多视图聚类算法(kernel based multi-view clustering method, MVKKM)在处理大规模数据集时运行时间长的缺点,引入增量聚类模型的概念,将MVKKM算法与增量聚类模型相结合,提出基于核K-means的多视图增量聚类算法(incremental multi-view clustering algorithm based on kernel K-means, IMVCKM)。通过将数据集分块,在每个数据块中使用MVKKM算法聚类,并将每个数据块的聚类中心作为下个数据块的初始聚类中心。将所有块的聚类中心进行整合后再次进行多视图聚类,得到最终的聚类结果。试验结果表明,在3个大规模数据集上,IMVCKM算法相较于MVKKM算法在3个评价指标上具有更好的聚类结果,且运行时间更短。该算法在保证聚类性能的基础上大大降低算法的运行时间。
中图分类号:
[1] BICKEL S, SCHEFFER T. Multi-view clustering[C] //Proceedings of the 4th IEEE International Conference on Data Mining. New Jersey, USA: IEEE, 2004, 4: 19-26. [2] KUMAR A, DAUME H. A co-training approach for multi-view spectral clustering[C] //Proceedings of the 28th International Conference on Machine Learning. Washington, USA: ICML, 2011: 393-400. [3] TZOTZIS G, LIKAS A. Kernel-based weighted multi-view clustering[C] //Proceedings of the 12th IEEE International Conference on Data Mining. New Jersey, USA: IEEE, 2012: 675-684. [4] LIU Jialu, WANG Chi, GAO Jiawei, et al. Multi-view clustering via joint nonnegative matrix factorization[C] //Proceedings of the 2013 SIAM International Conference on Data Mining. Texas, USA: ResearchGate, 2013: 252-260. [5] WANG Dong, YIN Qiyue, HE Ran, et al. Multi-view clustering via structured low-rank representation[C] //Proceedings of the 24th ACM International Conference on Information and Knowledge Management. Melbourne, Australia: ACM, 2015: 1911-1914. [6] ZHAO Yang, DOU Yong, LIU Xinwang, et al. A novel multi-view clustering method via low-rank and matrix-induced regularization[J]. Neurocomputing, 2016, 216: 342-350. [7] 邓强, 杨燕, 王浩. 一种改进的多视图聚类集成算法[J]. 计算机科学, 2017, 44(1): 65-70. DENG Qiang, YANG Yan, WANG Hao. Improved multi-view clustering ensemble algorithm[J]. Computer Science, 2017, 44(1): 65-70. [8] CAN F, DROCHAK N D I. Incremental clustering for dynamic document databases[C] //IEEE International Symposium on Applied Computing. New Jersey, USA: IEEE, 1990: 61-67. [9] NG R T, HAN J. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(5): 1003-1016. [10] HORE P, HALL L O, Goldgof D B, et al. Online fuzzy C means[C] //Fuzzy Information Processing Society. New Jersey, USA: IEEE, 2008: 1-5. [11] HORE P, HALL L O, GOLDGOF D B. Single pass fuzzy C means[C] //IEEE International Conference on Fuzzy Systems. London, UK: IEEE, 2007: 1-7. [12] 李滔, 王士同. 适合大规模数据集的增量式模糊聚类算法[J]. 智能系统学报, 2016, 11(2):188-199. LI Tao, WANG Shitong. Incremental fuzzy(c+p)—means clustering for large data[J]. CAAI Transactions on Intelligent Systems, 2016, 11(2): 188-199. [13] 张佩瑞. 基于多核学习的多视图增量聚类模型研究[D]. 成都: 西南交通大学, 2017. ZHANG Peirui. Research on multi-view incremental clustering based on multiple kernel learning[D]. Chengdu: Southwest Jiaotong University, 2017. [14] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012. [15] 袁瑛. 基于正则化的多核学习方法及应用[D]. 广州: 华南理工大学, 2016. YUAN Ying. Multiple kernel learning with regularization and its application[D]. Guangzhou: South China University of Technology, 2016. [16] TZORTZIS G F, LIKAS A C. The global kernel K-means algorithm for clustering in feature space[J]. IEEE Transactions on Neural Networks, 2009, 20(7): 1181-1194. [17] 邓强. 多视图子空间聚类集成方法研究及分布式实现[D]. 成都: 西南交通大学, 2016. DENG Qiang. Research on multi-view subspace clustering ensemble and its distributed implementation[D]. Chengdu: Southwest Jiaotong University, 2016. [18] 杨燕, 靳蕃, KAMEL M. 聚类有效性评价综述[J]. 计算机应用研究, 2008(6):1630-1632, 1638. YANG Yan, JIN Fan, KAMEL M. Survey of clustering validity evaluation[J]. Application Research of Computers, 2008(6): 1630-1632, 1638. [19] 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. LIU Xiaoyong. A semi-supervised method based on tree kernel for relationship extraction[J]. Journal of Shandong University(Engineering Science), 2015, 45(2):22-26. [20] GU Quanquan, ZHOU Jie. Learning the shared subspace for multi-task clustering and transductive transfer classification[C] //Proceedings of the 9th IEEE International Conference on Data Mining. Florida, USA: IEEE, 2009: 159-168. [21] DENG Zhaohong, CHOI K S, CHUNG Fulai, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767-781. |
[1] | 周凯,元昌安,覃晓,郑彦,冯文铎. 基于核贝叶斯压缩感知的人脸识别[J]. 山东大学学报(工学版), 2016, 46(3): 74-78. |
[2] | 翟俊海,张素芳,胡文祥,王熙照. 核心集径向基函数极限学习机[J]. 山东大学学报(工学版), 2016, 46(2): 1-5. |
[3] | 徐平安,唐雁,石教开,张辉荣. 基于薛定谔方程的K-Means聚类算法[J]. 山东大学学报(工学版), 2016, 46(1): 34-41. |
[4] | 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15. |
[5] | 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. |
[6] | 浩庆波, 牟少敏, 尹传环, 昌腾腾, 崔文斌. 一种基于聚类的快速局部支持向量机算法[J]. 山东大学学报(工学版), 2015, 45(1): 13-18. |
[7] | 郭慧玲,王士同*,闫晓波. 基于广义旋转不变性核函数的人脸识别[J]. 山东大学学报(工学版), 2012, 42(5): 71-79. |
[8] | 张思懿1,2,王士同1*. 核化空间深度间距的特征提取方法[J]. 山东大学学报(工学版), 2012, 42(3): 45-51. |
[9] | 王胜春,韩捷,李剑峰,李志农 . 基于模糊域和支持向量机的故障诊断方法[J]. 山东大学学报(工学版), 2006, 36(6): 116-120 . |
|