JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2016, Vol. 46 ›› Issue (2): 29-34.doi: 10.6040/j.issn.1672-3961.0.2015.101

Previous Articles     Next Articles

A weight-based initial centers selection algorithm for K-modes clustering

JIANG Feng1, DU Junwei1, LIU Guozhu1, SUI Yuefei2   

  1. 1. College of Information Science &
    Technology, Qingdao University of Science and Technology, Qingdao 266061, Shandong, China;
    2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2015-04-15 Online:2016-04-20 Published:2015-04-15

Abstract: The current initialization methods for K-modes clustering do not consider the case in which various attributes have different significances. To solve this problem, a weighted density and weighted overlap distance-based initial center selection algorithm(called Ini-Weight)was proposed. In algorithm Ini-Weight, initial centers were selected by calculating the density of each object and the distance between any two objects. In Ini-Weight, when calculating the density of each object and the distance between any two objects, different weights were assigned to different attributes according to the significance of each attribute. Finally, Ini-Weight was compared with the current methods on UCI data sets. The results showed that Ini-Weight algorithm could effectively distinguish different attributes and improve the accuracy for selecting initial centers.

Key words: initial centers, weighted overlap distance, rough sets, K-modes clustering, weighted average density

CLC Number: 

  • TP181
[1] HAN J W, KAMBER M. Data mining: concepts and techniques[M]. 2nd edition. Burlington, USA: Morgan Kaufmann Publishers, 2006.
[2] JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31(3):264-323.
[3] 陈文强, 林琛, 陈珂,等. 基于GraphLab的分布式近邻传播聚类算法[J]. 山东大学学报(工学版), 2013, 43(5):13-18. CHEN Wenqiang, LIN Chen, CHEN Ke, et al. Distributed affinity propagation clustering algorithm based on GraphLab[J]. Journal of Shandong University(Engineering Science), 2013, 43(5):13-18.
[4] ANDERBERG M R. Cluster analysis for applications[M]. New York: Academic Press, 1973.
[5] BAI L, LIANG J Y, SUI C, et al. Fast global K-means clustering based on local geometrical information[J]. Information Sciences, 2013, 245:168-180.
[6] HUANG Z X. Extensions to the K-means algorithm for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998, 2(3):283-304.
[7] WU S, JIANG Q S, HUANG Z X. A new initialization method for clustering categorical data[C] //Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Nanjing, China: Springer, 2007:972-980.
[8] CAO F Y, LIANG J Y, BAI L. A new initialization method for categorical data clustering[J]. Expert Systems with Application, 2009, 36(7):10223-10228.
[9] BAI L, LIANG J Y, DANG C Y. An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data[J]. Knowledge-Based Systems, 2011, 24(6):785-795.
[10] KHAN S S, AHMAD A. Cluster center initialization algorithm for K-modes clustering[J]. Expert Systems with Applications, 2013, 40(18):7444-7456.
[11] BRADLEY P S, FAYYAD U M. Refining initial points for K-means clustering[C] //Proceedings of the 15th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1998:91-99.
[12] CAO F Y, LIANG J Y, JIANG G. An initialization method for the K-means algorithm using neighborhood model[J]. Computers and Mathematics with Applications, 2009, 58(3):474-483.
[13] 王守强, 朱大铭, 史士英. K-means聚类问题的改进近似算法[J]. 山东大学学报(工学版), 2011, 41(4):125-132. WANG Shouqiang, ZHU D M, SHI Shiying. Improved approximation algorithm for the K-means clustering problem[J]. Journal of Shandong University(Engineering Science), 2011, 41(4):125-132.
[14] 夏战国, 万玲, 蔡世玉,等. 一种面向入侵检测的半监督聚类算法[J]. 山东大学学报(工学版), 2012, 42(6):1-7. XIA Zhanguo, WAN Ling, CAI Shiyu, et al. A semi-supervised clustering algorithm oriented to intrusion detection[J]. Journal of Shandong University(Engineering Science), 2012, 42(6):1-7.
[15] PAWLAK Z. Rough sets: theoretical aspects of reasoning about data[M]. Dordrecht, Netherlands: Kluwer Academic Publishers, 1991.
[16] SHANNON C E. The mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3-4):373-423.
[17] D(¨overU)NTSCH I, GEDIGA G. Uncertainty measures of rough set prediction[J]. Artificial Intelligence, 1998, 106(1):109-137.
[18] 苗夺谦, 胡桂荣. 知识约简的一种启发式算法[J]. 计算机研究与发展, 1999, 36(6):681-684. MIAO Duoqian, HU Guirong. A heuristic algorithm for reduction of knowledge[J]. Computer Research and Development, 1999, 36(6):681-684.
[19] LIANG J Y, SHI Z Z. The information entropy, rough entropy and knowledge granulation in rough set theory[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2004, 12(1):37-46.
[20] JIANG F, SUI Y F, ZHOU L. A relative decision entropy-based feature selection approach[J]. Pattern Recognition, 2015, 48(7):2151-2163.
[21] 江峰, 王莎莎, 杜军威,等. 基于近似决策熵的属性约简[J]. 控制与决策, 2015, 30(1):65-70. JIANG Feng, WANG Shasha, DU Junwei, et al. Attribute reduction based on approximation decision entropy[J]. Control and Decision, 2015, 30(1):65-70.
[22] 梁吉业, 白亮, 曹付元. 基于新的距离度量的K-modes聚类算法[J]. 计算机研究与发展, 2010, 47(10):1749-1755. LIANG Jiye, BAI Liang, CAO Fuyuan. K-modes clustering algorithm based on a new distance measure[J]. Journal of Computer Research and Development, 2010, 47(10):1749-1755.
[23] 徐章艳, 刘作鹏, 杨炳儒,等. 一个复杂度为max(O(|C||U|), O(|C|2|U/C|))的快速属性约简算法[J]. 计算机学报, 2006, 29(3):391-399. XU Zhangyan, LIU Zuopeng, YANG Bingru, et al. A quick attribute reduction algorithm with complexity of max(O(|C||U|), O(|C|2|U/C|))[J]. Chinese Journal of Computers, 2006, 29(3):391-399.
[24] HUANG Z X, NG M K. A fuzzy K-modes algorithm for clustering categorical data[J]. IEEE Transactions on Fuzzy Systems, 1999, 7(4):446-452.
[1] CHEN Yu-ming, WU Ke-shou, XIE Rong-sheng. Reduction for decision table based on relative knowledge granularity [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(6): 8-12.
[2] WU Ke-shou, CHEN Yu-ming, ZENG Zhi-qiang. Decision table reduction based on neighborhood relation [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(2): 7-10.
[3] ZHAI Jun-hai, GAO Yuan-yuan, WANG Xi-zhao, CHEN Jun-fen. An attribute reduction algorithm based on partition subset [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 24-28.
[4] ZHAO Jun-kai,CAI Cheng-wen,SHI Kai-quan, . Function S-rough set and open-loop control system disturbs recognition [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 60-63 .
[5] LI Chengdong,LEI Hong,SHI Kaiquan . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 73-80 .
[6] MENG Jian,CUI Minghui,SHI Kaiquan, . The use of variation rough sets in set pair analysis [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(3): 65-68 .
[7] HU Ming-li,HU Hai-qing . Two direction rough decision-making and decision discernment on assistant set [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 69-74 .
[8] LIU Ji-qin,ZHANG Ping,YAN Jian-jun. . Variation knowledge and its filter characters [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 93-99 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Ke,LIU Chang-chun,LI Tong-lei . Medical registration approach using improved maximization of mutual information[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 107 -110 .
[2] YUE Yuan-Zheng. Relaxation in glasses far from equilibrium[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 1 -20 .
[3] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[4] LI Hui-ping, ZHAO Guo-qun, ZHANG Lei, HE Lian-fang. The development status of hot stamping and quenching of ultra high-strength steel[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 69 -74 .
[5] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[6] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .
[7] ZHAO Zhi-guang,WANG Deng-jie,TIAN Yun-fei . Roadbed settlement based on the gray theory[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 86 -88 .
[8] WANG Xue-ping,WANG Deng-jie,SUN Ying-ming*,DONG Lei . Application of the nonprism total station in the detection of a highway bridge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 105 -108 .
[9] MENG Jian, LI Yibin, LI Bin. Bound gait controlling method of quadruped robot[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 28 -34 .
[10] HANG Guang-qing,KONG Fan-yu,LI Da-xing, . Efficient algorithm with resistance to simple power analysis on Koblitz curves[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 78 -80 .