JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2016, Vol. 46 ›› Issue (5): 37-44.doi: 10.6040/j.issn.1672-3961.1.2016.031

Previous Articles     Next Articles

Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd

ZHOU Wang, ZHANG Chenlin, WU Jianxin   

  1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210046, Jiangsu, China
  • Received:2016-03-01 Online:2016-10-20 Published:2016-03-01

Abstract: The traditional Hartigan-Wong clustering algorithm could cause the unbalanced clustering problem. To solve this problem, Charl which is a novel qualitative balanced clustering method was proposed to improve the balance level while the absolute balance was not required. Charl combined ideas from both the Lloyds method and the Hartigan-Wong method, Charl proposed an adaptive tuning strategy to tune the balance level. This algorithm was a batch processing method, which shared the efficiency benefits of the Lloyds method. Experiments on 13 benchmark datasets showed that Charl not only produced more balanced output groups, but also achieved lower cost values and higher clustering performances(in terms of accuracy, normal mutual information and time cost)than the Lloyds method. This qualitative balancing method also outperformed the quantitative balanced clustering method by a large margin.

Key words: qualitative balancing, Hartigan-Wong, balanced clustering, Lloyd, machine learning

CLC Number: 

  • TP391
[1] LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2):129-137.
[2] BISHOP C M. Pattern recognition and machine learning[M]. Singapore: Springer, 2006.
[3] 贺玲, 吴玲达, 蔡益朝. 数据挖掘中的聚类算法综述[J]. 计算机应用研究, 2007, 24(1):10-13. HE Ling, WU Lingda, CAI Yichao. Survey of clustering algorithms in data mining[J]. Application Research of Computers, 2007, 24(1):10-13.
[4] 孙吉贵,刘杰,赵连宇.聚类算法研究[J]. 软件学报, 2008,19(1): 48-61. SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1):48-61.
[5] CHEN X, DENG C. Large scale spectral clustering with landmark-based representation[C] //Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2011:313-318.
[6] HUANG J, NIE F, HUANG H. Spectral rotation versus k-means in spectral clustering [C] //Proceedings of the 27th AAAI Conference on Artificial Intelligence. Washington, DC, USA: AAAI, 2013: 431-437.
[7] 周林, 平西建, 徐森, 等. 基于谱聚类的聚类集成算法[J]. 自动化学报, 2012, 38(8):1335-1342. ZHOU Lin, PING Xijian, XU Sen, et al. Cluster ensemble based on spectral clustering[J]. Acta Automatica Sinica, 2012, 38(8):1335-1342.
[8] WU J, LIU H, XIONG H, et al. A theoretic framework of k-means-based consensus clustering[C] //Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China: AAAI, 2013:1799-1805.
[9] XU R, WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3):645-678.
[10] 王守强,朱大铭,史士英. K-means聚类问题的改进近似算法[J].山东大学学报(工学版), 2011, 41(4):125-132. WANG Shouqiang, ZHU Daming, SHI Shiying. Improved approximation algorithm for the K-means clustering problem[J]. Journal of Shandong University(Engineering Science), 2011, 41(4):125-132.
[11] 贾洪杰,丁世飞,史忠植.求解大规模谱聚类的近似加权K-means算法[J].软件学报, 2015, 26(11):2836-2846. JIA Hongjie, DING Shifei, SHI Zhongzhi. Approximate weighted kernel K-means for large-scale spectral clustering[J]. Journal of Software, 2015, 26(11):2836- 2846.
[12] 雷小锋, 谢昆青, 林帆,等. 一种基于 K-Means 局部最优性的高效聚类算法[J]. 软件学报, 2008, 19(7): 1683-1692. LEI Xiaofeng, XIE Kunqing, LIN Fan, et al. An efficient clustering algorithm based on local optimality of K-means[J]. Journal of Software, 2008, 19(7):1683-1692.
[13] LI F, PERONA P. A bayesian hierarchical model for learning natural scene categories[C] //Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005:524-531.
[14] JURIE F, TRIGGS B. Creating efficient codebooks for visual recognition[C] //Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005:604-610.
[15] WU J, REHG J M. Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel[C] //Proceedings of the 12th Inter- national Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 630-637.
[16] WU J, TAN W C, REHG J M. Efficient and effective visual codebook generation using additive kernels[J]. Journal of Machine Learning Research, 2011,12(9):3097-3118.
[17] BANERJEE A, GHOSH J. Scalable clustering algorithms with balancing constraints[J]. Data Mining and Knowledge Discovery, 2006, 13(3):365-395.
[18] ZHU S, WANG D, LI T. Data clustering with size constraints[J]. Knowledge-Based Systems, 2010, 23(8):883-889.
[19] HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society(Series C, Applied Statistics), 1979, 28(1):100-108.
[20] TELGARSKY M, VATTANI A. Hartigans Method: k-means clustering without Voronoi[C] //Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR, 2010:820-827.
[21] SLONIM N, AHARONI E, CRAMMER K. Hartigans k-means versus Lloyds k-means: is it time for a change?[C] //Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China: AAAI, 2013:1677-1684.
[22] PAPADIMITRIOU C H, STEIGLITZ K. Combinatorial optimization: algorithms and complexity[M]. New York, USA: Dover Publications, 1998.
[23] CAI D, HE X, HAN J. Document clustering using locality preserving indexing[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(12):1624-1637.
[24] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850.
[25] JEGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1):117-128.
[1] ZHU Ming, SHI Chenglong, LÜ Pan, LIU Xianrong, SUN Chi, CHEN Jiancheng, FAN Hongyun. Deformation prediction method and engineering application of deep foundation pit based on optimized LSTM method [J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 141-148.
[2] Xiushan NIE,Yuling MA,Huiyan QIAO,Jie GUO,Chaoran CUI,Zhiyun YU,Xingbo LIU,Yilong YIN. Survey on student academic performance prediction from the perspective of task granularity [J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 1-14.
[3] Gaoteng YUAN,Yihui LIU,Wei HUANG,Bing HU. MR image classification and recognition model of breast cancer based onGabor feature [J]. Journal of Shandong University(Engineering Science), 2020, 50(3): 15-23.
[4] Minghe GAO,Ying ZHANG,Rongrong ZHANG,Zihao HUANG,Linyan HUANG,Fanyu LI,Xin ZHANG,Yanhao WANG. Air quality prediction approach based on integrating forecasting dataset [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 91-99.
[5] Dapeng ZHANG,Yajun LIU,Wei ZHANG,Fen SHEN,Jiansheng YANG. Fake comment detection based on heterogeneous ensemble learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 1-9.
[6] Yutian LIU, Runjia SUN, Hongtao WANG, Xueping GU. Review on application of artificial intelligence in power system restoration [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 1-8.
[7] Tong LI,Ran MA,Honghe ZHENG,Ping AN,Xiangyu HU. An error sensitivity model based on video statistical features [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 116-121.
[8] Qijie ZOU,Haoyu LI,Rubo ZHANG,Tengda PEI,Yan LIU. Survey of human-robot interaction control for autonomous driving [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 23-33.
[9] Mian ZHANG,Ying HUANG,Haiyi MEI,Yu GUO. Intelligent interaction method for power distribution robot based on Kinect [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 103-108.
[10] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[11] WEI Bo, ZHANG Wensheng, LI Yuanxiang, XIA Xuewen, LYU Jingqin. A sparse online learning algorithm for feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 22-27.
[12] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[13] LIU Jie, YANG Peng, LYU Wensheng, LIU Agudamu, LIU Junxiu. Prediction models of PM2.5 mass concentration based on meteorological factors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 76-83.
[14] ZHENG Yi, ZHU Chengzhang. A prediction method of atmospheric PM2.5 based on DBNs [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 19-25.
[15] XIE Lin1, YIN Xi-yao2, LI Fan-zhang3, WU Jia3. A kind of inverse resolution learning expression [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(4): 46-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Yong-hua,WANG An-ling,LIU Fu-ping . The reflected phase angle of low frequent inhomogeneous[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 22 -25 .
[2] LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[3] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[4] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[5] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[6] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[7] ZHANG Ying,LANG Yongmei,ZHAO Yuxiao,ZHANG Jianda,QIAO Peng,LI Shanping . Research on technique of aerobic granular sludge cultivationby seeding EGSB anaerobic granular sludge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 56 -59 .
[8] WANG Li-ju,HUANG Qi-cheng,WANG Zhao-xu . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 51 -56 .
[9] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[10] SUN Weiwei, WANG Yuzhen. Finite gain stabilization of singlemachine infinite bus system subject to saturation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 69 -76 .