山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (2): 29-34.doi: 10.6040/j.issn.1672-3961.0.2015.101
江峰1,杜军威1,刘国柱1,眭跃飞2
JIANG Feng1, DU Junwei1, LIU Guozhu1, SUI Yuefei2
摘要: 针对现有的K-modes聚类初始化方法没有考虑不同的属性具有不同的重要性这一问题,提出一种基于加权密度与加权重叠距离的初始中心选择算法Ini-Weight。Ini-Weight算法通过计算每个对象的密度以及对象之间的距离来选择初始中心。在计算对象的密度以及对象的距离时,Ini-Weight算法根据每个属性的重要性为不同的属性赋予不同的权值。最后,在UCI数据集上将Ini-Weight与现有的方法进行了比较,结果表明,Ini-Weight算法可以有效地区分不同的属性,而且提高了初始中心选择的准确性。
中图分类号:
[1] HAN J W, KAMBER M. Data mining: concepts and techniques[M]. 2nd edition. Burlington, USA: Morgan Kaufmann Publishers, 2006. [2] JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31(3):264-323. [3] 陈文强, 林琛, 陈珂,等. 基于GraphLab的分布式近邻传播聚类算法[J]. 山东大学学报(工学版), 2013, 43(5):13-18. CHEN Wenqiang, LIN Chen, CHEN Ke, et al. Distributed affinity propagation clustering algorithm based on GraphLab[J]. Journal of Shandong University(Engineering Science), 2013, 43(5):13-18. [4] ANDERBERG M R. Cluster analysis for applications[M]. New York: Academic Press, 1973. [5] BAI L, LIANG J Y, SUI C, et al. Fast global K-means clustering based on local geometrical information[J]. Information Sciences, 2013, 245:168-180. [6] HUANG Z X. Extensions to the K-means algorithm for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998, 2(3):283-304. [7] WU S, JIANG Q S, HUANG Z X. A new initialization method for clustering categorical data[C] //Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Nanjing, China: Springer, 2007:972-980. [8] CAO F Y, LIANG J Y, BAI L. A new initialization method for categorical data clustering[J]. Expert Systems with Application, 2009, 36(7):10223-10228. [9] BAI L, LIANG J Y, DANG C Y. An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data[J]. Knowledge-Based Systems, 2011, 24(6):785-795. [10] KHAN S S, AHMAD A. Cluster center initialization algorithm for K-modes clustering[J]. Expert Systems with Applications, 2013, 40(18):7444-7456. [11] BRADLEY P S, FAYYAD U M. Refining initial points for K-means clustering[C] //Proceedings of the 15th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1998:91-99. [12] CAO F Y, LIANG J Y, JIANG G. An initialization method for the K-means algorithm using neighborhood model[J]. Computers and Mathematics with Applications, 2009, 58(3):474-483. [13] 王守强, 朱大铭, 史士英. K-means聚类问题的改进近似算法[J]. 山东大学学报(工学版), 2011, 41(4):125-132. WANG Shouqiang, ZHU D M, SHI Shiying. Improved approximation algorithm for the K-means clustering problem[J]. Journal of Shandong University(Engineering Science), 2011, 41(4):125-132. [14] 夏战国, 万玲, 蔡世玉,等. 一种面向入侵检测的半监督聚类算法[J]. 山东大学学报(工学版), 2012, 42(6):1-7. XIA Zhanguo, WAN Ling, CAI Shiyu, et al. A semi-supervised clustering algorithm oriented to intrusion detection[J]. Journal of Shandong University(Engineering Science), 2012, 42(6):1-7. [15] PAWLAK Z. Rough sets: theoretical aspects of reasoning about data[M]. Dordrecht, Netherlands: Kluwer Academic Publishers, 1991. [16] SHANNON C E. The mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3-4):373-423. [17] D(¨overU)NTSCH I, GEDIGA G. Uncertainty measures of rough set prediction[J]. Artificial Intelligence, 1998, 106(1):109-137. [18] 苗夺谦, 胡桂荣. 知识约简的一种启发式算法[J]. 计算机研究与发展, 1999, 36(6):681-684. MIAO Duoqian, HU Guirong. A heuristic algorithm for reduction of knowledge[J]. Computer Research and Development, 1999, 36(6):681-684. [19] LIANG J Y, SHI Z Z. The information entropy, rough entropy and knowledge granulation in rough set theory[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2004, 12(1):37-46. [20] JIANG F, SUI Y F, ZHOU L. A relative decision entropy-based feature selection approach[J]. Pattern Recognition, 2015, 48(7):2151-2163. [21] 江峰, 王莎莎, 杜军威,等. 基于近似决策熵的属性约简[J]. 控制与决策, 2015, 30(1):65-70. JIANG Feng, WANG Shasha, DU Junwei, et al. Attribute reduction based on approximation decision entropy[J]. Control and Decision, 2015, 30(1):65-70. [22] 梁吉业, 白亮, 曹付元. 基于新的距离度量的K-modes聚类算法[J]. 计算机研究与发展, 2010, 47(10):1749-1755. LIANG Jiye, BAI Liang, CAO Fuyuan. K-modes clustering algorithm based on a new distance measure[J]. Journal of Computer Research and Development, 2010, 47(10):1749-1755. [23] 徐章艳, 刘作鹏, 杨炳儒,等. 一个复杂度为max(O(|C||U|), O(|C|2|U/C|))的快速属性约简算法[J]. 计算机学报, 2006, 29(3):391-399. XU Zhangyan, LIU Zuopeng, YANG Bingru, et al. A quick attribute reduction algorithm with complexity of max(O(|C||U|), O(|C|2|U/C|))[J]. Chinese Journal of Computers, 2006, 29(3):391-399. [24] HUANG Z X, NG M K. A fuzzy K-modes algorithm for clustering categorical data[J]. IEEE Transactions on Fuzzy Systems, 1999, 7(4):446-452. |
[1] | 景运革,李天瑞. 基于知识粒度的增量约简算法[J]. 山东大学学报(工学版), 2016, 46(1): 1-9. |
[2] | 高峰1,迟春梅2. 决策表中属性的重排[J]. 山东大学学报(工学版), 2013, 43(5): 6-12. |
[3] | 樊伟. 一种多粒度粗糙区间模糊集方法[J]. 山东大学学报(工学版), 2013, 43(1): 63-68. |
[4] | 陈玉明,吴克寿,谢荣生. 基于相对知识粒度的决策表约简[J]. 山东大学学报(工学版), 2012, 42(6): 8-12. |
[5] | 李慧1,2,胡云1,3,李存华1. 基于粗糙集理论的瓦斯灾害信息特征提取技术[J]. 山东大学学报(工学版), 2012, 42(5): 91-95. |
[6] | 杨习贝1,2,黄佳玲1,周君仪3,杨静宇2. 不完备系统中基于特征相容块的粗糙集[J]. 山东大学学报(工学版), 2012, 42(5): 1-6. |
[7] | 吴克寿,陈玉明,曾志强. 基于邻域关系的决策表约简[J]. 山东大学学报(工学版), 2012, 42(2): 7-10. |
[8] | 翟俊海,高原原,王熙照,陈俊芬. 基于划分子集的属性约简算法[J]. 山东大学学报(工学版), 2011, 41(4): 24-28. |
[9] | 焦吉成,高学东,王元璞,赵传领 . 关系积理论及属性约简算法[J]. 山东大学学报(工学版), 2008, 38(2): 112-116 . |
[10] | 李成栋,雷红,史开泉 . 一种基于粗集的模糊系统设计方法[J]. 山东大学学报(工学版), 2006, 36(4): 73-80 . |
[11] | 孟健,崔明辉,史开泉 . 变异粗集在集对分析中的应用[J]. 山东大学学报(工学版), 2006, 36(3): 65-68 . |
[12] | 李秀红,张东升 . (α,β)-粗糙集模型中阈值的确定与解释[J]. 山东大学学报(工学版), 2006, 36(2): 81-85 . |
[13] | 管延勇,胡海清,王洪凯 . α-粗糙集模型中的不可分辨关系[J]. 山东大学学报(工学版), 2006, 36(1): 75-80 . |
|