Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd

ZHOU Wang, ZHANG Chenlin, WU Jianxin   

  1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210046, Jiangsu, China
  • Received:2016-03-01 Online:2016-10-20 Published:2016-03-01

Abstract: The traditional Hartigan-Wong clustering algorithm could cause the unbalanced clustering problem. To solve this problem, Charl which is a novel qualitative balanced clustering method was proposed to improve the balance level while the absolute balance was not required. Charl combined ideas from both the Lloyds method and the Hartigan-Wong method, Charl proposed an adaptive tuning strategy to tune the balance level. This algorithm was a batch processing method, which shared the efficiency benefits of the Lloyds method. Experiments on 13 benchmark datasets showed that Charl not only produced more balanced output groups, but also achieved lower cost values and higher clustering performances(in terms of accuracy, normal mutual information and time cost)than the Lloyds method. This qualitative balancing method also outperformed the quantitative balanced clustering method by a large margin.

Key words: qualitative balancing, Hartigan-Wong, balanced clustering, Lloyd, machine learning

