JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2016, Vol. 46 ›› Issue (5): 37-44.doi: 10.6040/j.issn.1672-3961.1.2016.031

Previous Articles     Next Articles

Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd

ZHOU Wang, ZHANG Chenlin, WU Jianxin   

  1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210046, Jiangsu, China
  • Received:2016-03-01 Online:2016-10-20 Published:2016-03-01

Abstract: The traditional Hartigan-Wong clustering algorithm could cause the unbalanced clustering problem. To solve this problem, Charl which is a novel qualitative balanced clustering method was proposed to improve the balance level while the absolute balance was not required. Charl combined ideas from both the Lloyds method and the Hartigan-Wong method, Charl proposed an adaptive tuning strategy to tune the balance level. This algorithm was a batch processing method, which shared the efficiency benefits of the Lloyds method. Experiments on 13 benchmark datasets showed that Charl not only produced more balanced output groups, but also achieved lower cost values and higher clustering performances(in terms of accuracy, normal mutual information and time cost)than the Lloyds method. This qualitative balancing method also outperformed the quantitative balanced clustering method by a large margin.

Key words: qualitative balancing, Hartigan-Wong, balanced clustering, Lloyd, machine learning

CLC Number: 

  • TP391
[1] LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2):129-137.
[2] BISHOP C M. Pattern recognition and machine learning[M]. Singapore: Springer, 2006.
[3] 贺玲, 吴玲达, 蔡益朝. 数据挖掘中的聚类算法综述[J]. 计算机应用研究, 2007, 24(1):10-13. HE Ling, WU Lingda, CAI Yichao. Survey of clustering algorithms in data mining[J]. Application Research of Computers, 2007, 24(1):10-13.
[4] 孙吉贵,刘杰,赵连宇.聚类算法研究[J]. 软件学报, 2008,19(1): 48-61. SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1):48-61.
[5] CHEN X, DENG C. Large scale spectral clustering with landmark-based representation[C] //Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2011:313-318.
[6] HUANG J, NIE F, HUANG H. Spectral rotation versus k-means in spectral clustering [C] //Proceedings of the 27th AAAI Conference on Artificial Intelligence. Washington, DC, USA: AAAI, 2013: 431-437.
[7] 周林, 平西建, 徐森, 等. 基于谱聚类的聚类集成算法[J]. 自动化学报, 2012, 38(8):1335-1342. ZHOU Lin, PING Xijian, XU Sen, et al. Cluster ensemble based on spectral clustering[J]. Acta Automatica Sinica, 2012, 38(8):1335-1342.
[8] WU J, LIU H, XIONG H, et al. A theoretic framework of k-means-based consensus clustering[C] //Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China: AAAI, 2013:1799-1805.
[9] XU R, WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3):645-678.
[10] 王守强,朱大铭,史士英. K-means聚类问题的改进近似算法[J].山东大学学报(工学版), 2011, 41(4):125-132. WANG Shouqiang, ZHU Daming, SHI Shiying. Improved approximation algorithm for the K-means clustering problem[J]. Journal of Shandong University(Engineering Science), 2011, 41(4):125-132.
[11] 贾洪杰,丁世飞,史忠植.求解大规模谱聚类的近似加权K-means算法[J].软件学报, 2015, 26(11):2836-2846. JIA Hongjie, DING Shifei, SHI Zhongzhi. Approximate weighted kernel K-means for large-scale spectral clustering[J]. Journal of Software, 2015, 26(11):2836- 2846.
[12] 雷小锋, 谢昆青, 林帆,等. 一种基于 K-Means 局部最优性的高效聚类算法[J]. 软件学报, 2008, 19(7): 1683-1692. LEI Xiaofeng, XIE Kunqing, LIN Fan, et al. An efficient clustering algorithm based on local optimality of K-means[J]. Journal of Software, 2008, 19(7):1683-1692.
[13] LI F, PERONA P. A bayesian hierarchical model for learning natural scene categories[C] //Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005:524-531.
[14] JURIE F, TRIGGS B. Creating efficient codebooks for visual recognition[C] //Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005:604-610.
[15] WU J, REHG J M. Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel[C] //Proceedings of the 12th Inter- national Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 630-637.
[16] WU J, TAN W C, REHG J M. Efficient and effective visual codebook generation using additive kernels[J]. Journal of Machine Learning Research, 2011,12(9):3097-3118.
[17] BANERJEE A, GHOSH J. Scalable clustering algorithms with balancing constraints[J]. Data Mining and Knowledge Discovery, 2006, 13(3):365-395.
[18] ZHU S, WANG D, LI T. Data clustering with size constraints[J]. Knowledge-Based Systems, 2010, 23(8):883-889.
[19] HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society(Series C, Applied Statistics), 1979, 28(1):100-108.
[20] TELGARSKY M, VATTANI A. Hartigans Method: k-means clustering without Voronoi[C] //Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR, 2010:820-827.
[21] SLONIM N, AHARONI E, CRAMMER K. Hartigans k-means versus Lloyds k-means: is it time for a change?[C] //Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China: AAAI, 2013:1677-1684.
[22] PAPADIMITRIOU C H, STEIGLITZ K. Combinatorial optimization: algorithms and complexity[M]. New York, USA: Dover Publications, 1998.
[23] CAI D, HE X, HAN J. Document clustering using locality preserving indexing[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(12):1624-1637.
[24] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850.
[25] JEGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1):117-128.
[1] Mian ZHANG,Ying HUANG,Haiyi MEI,Yu GUO. Intelligent interaction method for power distribution robot based on Kinect [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 103-108.
[2] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[3] WEI Bo, ZHANG Wensheng, LI Yuanxiang, XIA Xuewen, LYU Jingqin. A sparse online learning algorithm for feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 22-27.
[4] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[5] LIU Jie, YANG Peng, LYU Wensheng, LIU Agudamu, LIU Junxiu. Prediction models of PM2.5 mass concentration based on meteorological factors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 76-83.
[6] ZHENG Yi, ZHU Chengzhang. A prediction method of atmospheric PM2.5 based on DBNs [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 19-25.
[7] XIE Lin1, YIN Xi-yao2, LI Fan-zhang3, WU Jia3. A kind of inverse resolution learning expression [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(4): 46-50.
[8] HE Xue-ying1, 2, QIN Wei1, YIN Yi-long1 *, ZHAO Lian-zheng1,QIAO Hao3. Video-based fingerprint verification using machine learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 29-33.
[9] LIANG Chun-lin1, PENG Ling-xi2*. An immune network based unsupervised classifier [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 82-86.
[10] GUO Mao-Zu, ZOU Quan, LI Wen-Bin, HAN Ying-Peng. Learning in bioinformatics [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 1-6.
Full text



[1] CHENG Daizhan, LI Zhiqiang. A survey on linearization of nonlinear systems[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 26 -36 .
[2] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[3] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[5] CHEN Huaxin, CHEN Shuanfa, WANG Binggang. The aging behavior and mechanism of base asphalts[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 125 -130 .
[7] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .
[8] ZHAO Ke-Jun, WANG Xin-Jun, LIU Xiang, CHOU Yi-Hong. Algorithms of continuous top-k join query over structured overlay networks[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 32 -37 .
[9] ZHAO Zhi-guang,WANG Deng-jie,TIAN Yun-fei . Roadbed settlement based on the gray theory[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 86 -88 .
[10] YAO Zhan-yong,SHANG Qing-sen,ZHAO Zhi-zhong,JIA Zhao-xia . The influence analysis of the semirigid asphalt pavement configuration stress and distortion by interface conditions[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 93 -99 .