JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2018, Vol. 48 ›› Issue (3): 140-145.doi: 10.6040/j.issn.1672-3961.0.2017.410

Previous Articles    

Coefficient of variation clustering algorithm for non-uniform data

YANG Tianpeng1, XU Kunpeng1, CHEN Lifei1,2*   

  1. 1. College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, Fujian, China;
    2. Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, Fujian, China
  • Received:2017-08-24 Online:2018-06-20 Published:2017-08-24

Abstract: Affected by the “uniform effect”, a problem existed in the partition-based algorithms remained on open and challenging taskdue to handling. To solve this problem, a clustering algorithm based on coefficient of variation was proposed. The “uniform effect” caused by K-means-type partitioning clustering algorithm from the view of clustering optimization was analyzed. Instead of the squared error, a new measure of dispersion for non-uniform data was proposed relied on the coefficient of variation. The clustering objective optimization function was defined using a new non-uniform data dissimilarity formula, which was proposed based on the coefficient of variation. According to the local optimization method, the clustering algorithm process was given. The experimental results on real and synthetic non-uniform datasets showed that the clustering accuracy of CVCN was better than K-means, Verify2, ESSC.

Key words: clustering, partition-based clustering, coefficient of variation, K-means, uniform effect, non-uniform data

CLC Number: 

  • TP311
[1] 韩家炜,坎伯,裴健.数据挖掘:概念与技术[M]. 3版. 范明,孟小峰,译.北京: 机械工业出版社, 2012.
[2] BERKHIN P. A survey of clustering data mining techniques[J]. Grouping Multidimensional Data, 2002, 43(1): 25-71.
[3] 孙吉贵.刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1): 48-61. SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
[4] JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review[J]. Acm Computing Surveys, 1999, 31(3): 264-323.
[5] AGGARWAL C C, REDDY C K. Data clustering: algorithms and applications[M]. Boca Raton: CRC press, 2013.
[6] HE H, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 21(9): 1263-1284.
[7] KRAWCZYK B. Learning from imbalanced data: open challenges and future directions[J]. Progress in Artificial Intelligence, 2016, 5(4): 1-12.
[8] HARTIGAN J A, WONG M A. Algorithm as 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society Series C:Applied Statistics, 1979, 28(1): 100-108.
[9] XIONG H, WU J, CHEN J. K-means clustering versus validation measures: a data-distribution perspective[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B: Cybernetics, 2009, 39(2): 318-331.
[10] WU J, XIONG H, CHEN J. Adapting the right measures for K-means clustering[C] //Proceedings of the the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France: ACM,2009: 877-886.
[11] KUMAR C N S, RAO K N, GOVARDHAN A. An empirical comparative study of novel clustering algorithms for class imbalance learning[C] //Proceedings of the Second International Conference on Computer and Communication Technologies(IC3T). Hyderabad, India: Springer India, 2016:181-191.
[12] KUMAR N S, RAO K N, GOVARDHAN A, et al. Undersampled K-means approach for handling imbalanced distributed data[J]. Progress in Artificial Intelligence, 2014, 3(1): 29-38.
[13] LIANG J, BAI L, DANG C, et al. The K-means-type algorithms versus imbalanced data distributions[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(4): 728-745.
[14] MAHAJAN M, NIMBHORKAR P, VARADARAJAN K. The planar K-means problem is NP-hard[J]. Theoretical Computer Science, 2009, 442(8): 274-285.
[15] XU L, JORDAN M I. On convergence properties of the EM algorithm for Gaussian mixtures[J]. Neural Computation, 1996, 8(1): 129-151.
[16] MCLACHLAN G J, KRISHNAN T. The EM Algorithm and Extensions, Second Edition[M]. New York:[s.n.] , 2007.
[17] JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666.
[18] BROWN C E. Applied multivariate statistics in geohydrology and related sciences[M]. Berlin: Springer, 1998.
[19] EVERITT B. Cambridge dictionary of statistics[M]. Cambridge:Cambridge University Press, 2002.
[20] 齐敏. 模式识别导论[M]. 北京:清华大学出版社, 2009.
[21] ALOISE D, DESHPANDE A, HANSEN P, et al. NP-hardness of Euclidean sum-of-squares clustering[J]. Machine Learning, 2009, 75(2): 245-248.
[22] DENG Z H, CHOI K S, CHUNG F L, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767-781.
[23] LI X, CHEN Z,YANG F. Exploring of clustering algorithm on class-imbalanced data[C] //Proceedings of the 8th International Conference on Computer Science & Education(ICCSE). Columbo, Sri Lanka: IEEE, 2013:89-93.
[24] CHEN L, JIANG Q, WANG S. A probability model for projective clustering on high dimensional data[C] //Eighth IEEE International Conference on Data Mining. Pisa, Italy: IEEE Computer Society, 2008:755-760.
[25] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(3): 583-617.
[26] 陈黎飞, 吴涛. 数据挖掘中的特征约简[M]. 北京: 科学出版社, 2016.
[1] ZHANG Peirui, YANG Yan, XING Huanlai, YU Xiuying. Incremental multi-view clustering algorithm based on kernel K-means [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 48-53.
[2] DU Xixi, LIU Huafeng, JING Liping. An additive co-clustering for recommendation of integrating social network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 96-102.
[3] XIAO Miaomiao, WEI Benzheng, YIN Yilong. A hybrid intrusion detection system based on BFOA and K-means algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 115-119.
[4] PANG Renming, WANG Bo, YE Hao, ZHANG Haifeng, LI Mingliang. Clustering of blast furnace historical data based on PCA similarity factor and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 143-149.
[5] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[6] JI Xingquan, HAN Guozheng, LI Kejun, FU Rongrong, ZHU Yanghe. Application of improved K-means clustering algorithm based on density in distribution network block partitioning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 41-46.
[7] LI Shuo, SHI Yuliang. The method of spot cluster recommendation in location-based social networks [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 44-50.
[8] JIANG Feng, DU Junwei, LIU Guozhu, SUI Yuefei. A weight-based initial centers selection algorithm for K-modes clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(2): 29-34.
[9] FAN Shuyan, DING Shifei. An improved multi-scale Graph cut algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 28-33.
[10] XU Pingan, TANG Yan, SHI Jiaokai, ZHANG Huirong. K-Means clustering algorithm based on the Schrödinger equation [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(1): 34-41.
[11] ZHU Hong, DING Shifei. Twice clustering method based on variable granularity [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 1-6.
[12] DONG Hongbin, ZHANG Guangjiang, PANG Jinwei, HAN Qilong. A clustering ensemble algorithm based on co-evolution [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(2): 1-9.
[13] HAO Qingbo, MU Shaomin, YIN Chuanhuan, CHANG Tengteng, CUI Wenbin. An algorithm of fast local support vector machine based on clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(1): 13-18.
[14] YAO Huachuan, WANG Lizhen, WU Pingping, ZOU Muquan. AC_SAR: actionable clustering algorithm based on strong association rule [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 38-46.
[15] SI Junshu, ZHU Wenxing*, SHA Yonghe. A comprehensive method for traffic lights detection in complex background [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 64-68.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!