山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (5): 37-44.doi: 10.6040/j.issn.1672-3961.1.2016.031
周旺,张晨麟,吴建鑫
ZHOU Wang, ZHANG Chenlin, WU Jianxin
摘要: 基于传统的Hartigan-Wong聚类算法会产生不平衡聚类结果的缺点,提出一种新的聚类算法Charl,这种算法会改进聚类结果的平衡性但不要求绝对平衡。 结合Lloyd算法和Hartigan-Wong算法的思想,Charl算法采用一种自适应性的动态调整策略来调整平衡程度。跟Lloyd算法一样,Charl算法以批处理的方式更新中心,所以具有计算高效的性质。在13个数据集上进行的试验表明,Charl方法不仅产生了平衡的聚类结果,并且同时得到了比Lloyd算法更低的代价函数值和更好的聚类性能(聚类准确率、归一化互信息、聚类时间等)。这种定性平衡聚类算法也明显优于严格平衡的聚类算法。
中图分类号:
| [1] LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2):129-137. [2] BISHOP C M. Pattern recognition and machine learning[M]. Singapore: Springer, 2006. [3] 贺玲, 吴玲达, 蔡益朝. 数据挖掘中的聚类算法综述[J]. 计算机应用研究, 2007, 24(1):10-13. HE Ling, WU Lingda, CAI Yichao. Survey of clustering algorithms in data mining[J]. Application Research of Computers, 2007, 24(1):10-13. [4] 孙吉贵,刘杰,赵连宇.聚类算法研究[J]. 软件学报, 2008,19(1): 48-61. SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1):48-61. [5] CHEN X, DENG C. Large scale spectral clustering with landmark-based representation[C] //Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2011:313-318. [6] HUANG J, NIE F, HUANG H. Spectral rotation versus k-means in spectral clustering [C] //Proceedings of the 27th AAAI Conference on Artificial Intelligence. Washington, DC, USA: AAAI, 2013: 431-437. [7] 周林, 平西建, 徐森, 等. 基于谱聚类的聚类集成算法[J]. 自动化学报, 2012, 38(8):1335-1342. ZHOU Lin, PING Xijian, XU Sen, et al. Cluster ensemble based on spectral clustering[J]. Acta Automatica Sinica, 2012, 38(8):1335-1342. [8] WU J, LIU H, XIONG H, et al. A theoretic framework of k-means-based consensus clustering[C] //Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China: AAAI, 2013:1799-1805. [9] XU R, WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3):645-678. [10] 王守强,朱大铭,史士英. K-means聚类问题的改进近似算法[J].山东大学学报(工学版), 2011, 41(4):125-132. WANG Shouqiang, ZHU Daming, SHI Shiying. Improved approximation algorithm for the K-means clustering problem[J]. Journal of Shandong University(Engineering Science), 2011, 41(4):125-132. [11] 贾洪杰,丁世飞,史忠植.求解大规模谱聚类的近似加权K-means算法[J].软件学报, 2015, 26(11):2836-2846. JIA Hongjie, DING Shifei, SHI Zhongzhi. Approximate weighted kernel K-means for large-scale spectral clustering[J]. Journal of Software, 2015, 26(11):2836- 2846. [12] 雷小锋, 谢昆青, 林帆,等. 一种基于 K-Means 局部最优性的高效聚类算法[J]. 软件学报, 2008, 19(7): 1683-1692. LEI Xiaofeng, XIE Kunqing, LIN Fan, et al. An efficient clustering algorithm based on local optimality of K-means[J]. Journal of Software, 2008, 19(7):1683-1692. [13] LI F, PERONA P. A bayesian hierarchical model for learning natural scene categories[C] //Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005:524-531. [14] JURIE F, TRIGGS B. Creating efficient codebooks for visual recognition[C] //Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005:604-610. [15] WU J, REHG J M. Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel[C] //Proceedings of the 12th Inter- national Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 630-637. [16] WU J, TAN W C, REHG J M. Efficient and effective visual codebook generation using additive kernels[J]. Journal of Machine Learning Research, 2011,12(9):3097-3118. [17] BANERJEE A, GHOSH J. Scalable clustering algorithms with balancing constraints[J]. Data Mining and Knowledge Discovery, 2006, 13(3):365-395. [18] ZHU S, WANG D, LI T. Data clustering with size constraints[J]. Knowledge-Based Systems, 2010, 23(8):883-889. [19] HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society(Series C, Applied Statistics), 1979, 28(1):100-108. [20] TELGARSKY M, VATTANI A. Hartigans Method: k-means clustering without Voronoi[C] //Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR, 2010:820-827. [21] SLONIM N, AHARONI E, CRAMMER K. Hartigans k-means versus Lloyds k-means: is it time for a change?[C] //Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China: AAAI, 2013:1677-1684. [22] PAPADIMITRIOU C H, STEIGLITZ K. Combinatorial optimization: algorithms and complexity[M]. New York, USA: Dover Publications, 1998. [23] CAI D, HE X, HAN J. Document clustering using locality preserving indexing[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(12):1624-1637. [24] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850. [25] JEGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1):117-128. |
| [1] | 祝明,石承龙,吕潘,刘现荣,孙驰,陈建城,范宏运. 基于优化长短时记忆网络的深基坑变形预测方法及其工程应用[J]. 山东大学学报 (工学版), 2025, 55(3): 141-148. |
| [2] | 常新功,苏敏惠,周志刚. 基于进化集成的图神经网络解释方法[J]. 山东大学学报 (工学版), 2024, 54(4): 1-12. |
| [3] | 乔慧妍,段学龙,解驰皓,赵冬慧,马玉玲. 基于异常点检测的心理健康辅助诊断方法[J]. 山东大学学报 (工学版), 2024, 54(4): 76-85. |
| [4] | 刘新,刘冬兰,付婷,王勇,常英贤,姚洪磊,罗昕,王睿,张昊. 基于联邦学习的时间序列预测算法[J]. 山东大学学报 (工学版), 2024, 54(3): 55-63. |
| [5] | 岳仁峰,张嘉琦,刘勇,范学忠,李琮琮,孔令鑫. 基于颜色和纹理特征的立体车库锈蚀检测技术[J]. 山东大学学报 (工学版), 2024, 54(3): 64-69. |
| [6] | 陈成,董永权,贾瑞,刘源. 基于交互序列特征相关性的可解释知识追踪[J]. 山东大学学报 (工学版), 2024, 54(1): 100-108. |
| [7] | 卞小曼,王小琴,蓝如师,刘振丙,罗笑南. 基于相似性保持和判别性分析的快速视频哈希算法[J]. 山东大学学报 (工学版), 2023, 53(6): 63-69. |
| [8] | 李鸿钊,张庆松,刘人太,陈新,辛勤,石乐乐. 浅埋地铁车站施工期地表变形风险预警[J]. 山东大学学报 (工学版), 2023, 53(6): 82-91. |
| [9] | 袁高腾,周晓峰,郭宏乐. 基于特征选择算法的ECG信号分类[J]. 山东大学学报 (工学版), 2022, 52(4): 38-44. |
| [10] | 聂秀山,马玉玲,乔慧妍,郭杰,崔超然,于志云,刘兴波,尹义龙. 任务粒度视角下的学生成绩预测研究综述[J]. 山东大学学报 (工学版), 2022, 52(2): 1-14. |
| [11] | 孙鸿昌,周风余,单明珠,翟文文,牛兰强. 基于模式划分的空调能耗混合填补方法[J]. 山东大学学报 (工学版), 2022, 52(1): 9-18. |
| [12] | 袁高腾,刘毅慧,黄伟,胡兵. 基于Gabor特征的乳腺肿瘤MR图像分类识别模型[J]. 山东大学学报 (工学版), 2020, 50(3): 15-23. |
| [13] | 高铭壑,张莹,张蓉蓉,黄子豪,黄琳焱,李繁菀,张昕,王彦浩. 基于预测数据特征的空气质量预测方法[J]. 山东大学学报 (工学版), 2020, 50(2): 91-99. |
| [14] | 张大鹏,刘雅军,张伟,沈芬,杨建盛. 基于异质集成学习的虚假评论检测[J]. 山东大学学报 (工学版), 2020, 50(2): 1-9. |
| [15] | 刘玉田, 孙润稼, 王洪涛, 顾雪平. 人工智能在电力系统恢复中的应用综述[J]. 山东大学学报 (工学版), 2019, 49(5): 1-8. |
|