JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2018, Vol. 48 ›› Issue (3): 140-145.doi: 10.6040/j.issn.1672-3961.0.2017.410

Previous Articles    

Coefficient of variation clustering algorithm for non-uniform data

YANG Tianpeng1, XU Kunpeng1, CHEN Lifei1,2*   

  1. 1. College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, Fujian, China;
    2. Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, Fujian, China
  • Received:2017-08-24 Online:2018-06-20 Published:2017-08-24

Abstract: Affected by the “uniform effect”, a problem existed in the partition-based algorithms remained on open and challenging taskdue to handling. To solve this problem, a clustering algorithm based on coefficient of variation was proposed. The “uniform effect” caused by K-means-type partitioning clustering algorithm from the view of clustering optimization was analyzed. Instead of the squared error, a new measure of dispersion for non-uniform data was proposed relied on the coefficient of variation. The clustering objective optimization function was defined using a new non-uniform data dissimilarity formula, which was proposed based on the coefficient of variation. According to the local optimization method, the clustering algorithm process was given. The experimental results on real and synthetic non-uniform datasets showed that the clustering accuracy of CVCN was better than K-means, Verify2, ESSC.

Key words: clustering, partition-based clustering, coefficient of variation, K-means, uniform effect, non-uniform data

CLC Number: 

  • TP311
[1] 韩家炜,坎伯,裴健.数据挖掘:概念与技术[M]. 3版. 范明,孟小峰,译.北京: 机械工业出版社, 2012.
[2] BERKHIN P. A survey of clustering data mining techniques[J]. Grouping Multidimensional Data, 2002, 43(1): 25-71.
[3] 孙吉贵.刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1): 48-61. SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
[4] JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review[J]. Acm Computing Surveys, 1999, 31(3): 264-323.
[5] AGGARWAL C C, REDDY C K. Data clustering: algorithms and applications[M]. Boca Raton: CRC press, 2013.
[6] HE H, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 21(9): 1263-1284.
[7] KRAWCZYK B. Learning from imbalanced data: open challenges and future directions[J]. Progress in Artificial Intelligence, 2016, 5(4): 1-12.
[8] HARTIGAN J A, WONG M A. Algorithm as 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society Series C:Applied Statistics, 1979, 28(1): 100-108.
[9] XIONG H, WU J, CHEN J. K-means clustering versus validation measures: a data-distribution perspective[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B: Cybernetics, 2009, 39(2): 318-331.
[10] WU J, XIONG H, CHEN J. Adapting the right measures for K-means clustering[C] //Proceedings of the the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France: ACM,2009: 877-886.
[11] KUMAR C N S, RAO K N, GOVARDHAN A. An empirical comparative study of novel clustering algorithms for class imbalance learning[C] //Proceedings of the Second International Conference on Computer and Communication Technologies(IC3T). Hyderabad, India: Springer India, 2016:181-191.
[12] KUMAR N S, RAO K N, GOVARDHAN A, et al. Undersampled K-means approach for handling imbalanced distributed data[J]. Progress in Artificial Intelligence, 2014, 3(1): 29-38.
[13] LIANG J, BAI L, DANG C, et al. The K-means-type algorithms versus imbalanced data distributions[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(4): 728-745.
[14] MAHAJAN M, NIMBHORKAR P, VARADARAJAN K. The planar K-means problem is NP-hard[J]. Theoretical Computer Science, 2009, 442(8): 274-285.
[15] XU L, JORDAN M I. On convergence properties of the EM algorithm for Gaussian mixtures[J]. Neural Computation, 1996, 8(1): 129-151.
[16] MCLACHLAN G J, KRISHNAN T. The EM Algorithm and Extensions, Second Edition[M]. New York:[s.n.] , 2007.
[17] JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666.
[18] BROWN C E. Applied multivariate statistics in geohydrology and related sciences[M]. Berlin: Springer, 1998.
[19] EVERITT B. Cambridge dictionary of statistics[M]. Cambridge:Cambridge University Press, 2002.
[20] 齐敏. 模式识别导论[M]. 北京:清华大学出版社, 2009.
[21] ALOISE D, DESHPANDE A, HANSEN P, et al. NP-hardness of Euclidean sum-of-squares clustering[J]. Machine Learning, 2009, 75(2): 245-248.
[22] DENG Z H, CHOI K S, CHUNG F L, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767-781.
[23] LI X, CHEN Z,YANG F. Exploring of clustering algorithm on class-imbalanced data[C] //Proceedings of the 8th International Conference on Computer Science & Education(ICCSE). Columbo, Sri Lanka: IEEE, 2013:89-93.
[24] CHEN L, JIANG Q, WANG S. A probability model for projective clustering on high dimensional data[C] //Eighth IEEE International Conference on Data Mining. Pisa, Italy: IEEE Computer Society, 2008:755-760.
[25] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(3): 583-617.
[26] 陈黎飞, 吴涛. 数据挖掘中的特征约简[M]. 北京: 科学出版社, 2016.
[1] LI Xiaohui, LIU Xiaofei, SUN Weitong, ZHAO Yi, DONG Yuan, JIN Yinli. An inspection task assignment and path planning algorithm based on vehicles-UAVs collaboration [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 101-109.
[2] CHEN Sugen, ZHAO Zhizhong. Density peak clustering combining local truncation distance and small clusters merging [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 58-70.
[3] ZHU Hengdong, MA Yingcang, DAI Xuezhen. Adaptive semi-supervised neighborhood clustering algorithm [J]. Journal of Shandong University(Engineering Science), 2021, 51(4): 24-34.
[4] ZHU Changming, YUE Wen, WANG Panhong, SHEN Zhenyu, ZHOU Rigui. Global and local multi-view multi-label learning with active three-way clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 34-46.
[5] XIE Ziqi, WANG Lihong, LI Man. Active learning of pairwise constraints in block diagonal subspace clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 65-73.
[6] Bei LI,Song ZHAO,Zhijia XIE,Meng NIU. Electric vehicle virtual energy storage available capacity modeling [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 101-111.
[7] Xinyu DONG,Hanyue CHEN,Jiaguo LI,Qingyan MENG,Shihe XING,Liming ZHANG. An unsupervised color image segmentation method based on fusion of multiple methods [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 96-101.
[8] Jun QIN,Yuanpeng ZHANG,Yizhang JIANG,Wenlong HANG. Transfer fuzzy clustering based on self-constraint of multiple medoids [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 107-115.
[9] Yingxue ZHU,Ruizhang HUANG,Can MA. A short text dynamic clustering approach bias on new topic [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 8-18.
[10] Qiyue SONG, Xuewen MU, Huan CHENG. Segmentation of connected characters based on improved drop-fall algorithm [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 89-94.
[11] ZHANG Peirui, YANG Yan, XING Huanlai, YU Xiuying. Incremental multi-view clustering algorithm based on kernel K-means [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 48-53.
[12] DU Xixi, LIU Huafeng, JING Liping. An additive co-clustering for recommendation of integrating social network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 96-102.
[13] XIAO Miaomiao, WEI Benzheng, YIN Yilong. A hybrid intrusion detection system based on BFOA and K-means algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 115-119.
[14] PANG Renming, WANG Bo, YE Hao, ZHANG Haifeng, LI Mingliang. Clustering of blast furnace historical data based on PCA similarity factor and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(5): 143-149.
[15] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[2] XIA Bin,ZHANG Lian-jun . Energy comparison-based TOA estimation algorithm for the DS-CDMA UWB system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 70 -73 .
[3] LIU Xin 1, SONG Sili 1, WANG Xinhong 2. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 98 -100 .
[4] KONG Wei-tao,ZHANG Qing-fan,ZHANG Cheng-hui . DSP based implementation of the space vector pulse width modulation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 81 -84 .
[5] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 104 -107 .
[6] XUE Qiang,AI Xing,ZHAO Jun,ZHOU Yong-hui,YUAN Xun-liang . Effects of TiC nano-sized particle on the microstructure and properties of Si3N4 composite ceramics[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 69 -72 .
[7] DIAO Yong, TIAN Si-Meng, CAO Zhe-Meng. Geological work method for the construction of the Yichang Wanzhou Railway tunnel in high risk karst areas[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 91 -95 .
[8] YAO Zhan-yong,SHANG Qing-sen,ZHAO Zhi-zhong,JIA Zhao-xia . The influence analysis of the semirigid asphalt pavement configuration stress and distortion by interface conditions[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 93 -99 .
[9] DENG Bin,WANG Jiang . Estimating parameters of a neuron model based on chaos synchronization and adaptive control[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 19 -23 .
[10] WANG Xue-ping,WANG Deng-jie,SUN Ying-ming*,DONG Lei . Application of the nonprism total station in the detection of a highway bridge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 105 -108 .