JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2015, Vol. 45 ›› Issue (5): 36-42.doi: 10.6040/j.issn.1672-3961.2.2015.190

Previous Articles     Next Articles

Hierarchical cost sensitive decision tree and its application in the prediction of the mobile phone replacement

XIONG Bingyan, WANG Guoyin, DENG Weibin   

  1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2015-05-18 Revised:2015-08-06 Online:2018-10-20 Published:2015-05-18

Abstract: In the data of mobile phone users, imbalance problem existed between the replacement users and non replacement users, however traditional date mining pursued the best overall accuracy which led the prediction accuracy of the replacement users overly low. In order to solve this problem, a method of predicting the users who replace phone was proposed based on hierarchical cost sensitive decision tree. The algorithm realized attributes reduction and calculated the importance of attributes by rough set, then a hierarchical structure was built by parting the attributes; finally a cost sensitive decision tree was regarded as the base classifier for the hierarchical structure, the decision tree was constructed with its splitting criterion which included gini index and misclassification cost. Three experiments were made for the users data which from a telecom operator, the results showed that the hierarchical cost sensitive decision tree achieved a better effect on the imbalance user data and balance user data which obtained by under sampling.

Key words: hierarchical structure, imbalance data, prediction of replacing phone, cost sensitive, decision tree

CLC Number: 

  • TP391
[1] BATISTA G E, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6(1):20-29.
[2] KOTSIANTIS S B, PINTELAS P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1):46-55.
[3] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[4] HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[J]. Computer Science, 2005, 3644:878-887.
[5] GARCIA S, HERRERA F. Evolutionary under sampling for classification with imbalanced data sets: proposals and taxonomy[J]. Evolutionary Computation, 2009, 17(3):275-306.
[6] YEN S J, LEE Y S. Cluster-based under-sampling approaches for imbalanced data distributions[J]. Expert Systems with Applications, 2009, 36(3):5718-5727.
[7] WU J, XIONG H, WU P, et al. Local decomposition for rare class analysis[J]. Kdd, 2007, 20(2):191-220.
[8] BLASZCZYNSKI J, STEFANOWSKI J. Neighbourhood sampling in bagging for imbalanced data[J]. Neurocomputing, 2015, 150:529-542.
[9] KAI M T. An instance-weighting method to induce cost-sensitive trees[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3):659-665.
[10] ZHANG S. Decision tree classifiers sensitive to heterogeneous costs[J]. Journal of Systems and Software, 2012, 85(4):771-779.
[11] 郑燕, 王杨, 郝青峰, 等. 用于不平衡数据分类的代价敏感超网络算法[J]. 计算机应用, 2014, 34(5):1336-1340. ZHENG Yan, WANG Yang, HAO Qinfeng, et al. Cost-sensitive hypernetworks for imbalanced data classification[J]. Journal of Computer Applications, 2014, 34(5):1336-1340.
[12] PARK Y, LUO L, PARHI K K, et al. Seizure prediction with spectral power of EEG using cost-sensitive support vector machines[J]. Epilepsia, 2011, 52(10):1761-1770.
[13] BREIMAN L, FRIEDMAN J, STONE C J, et al. Classification and regression trees[M]. Boca Raton: CRC press, 1984.
[14] QUINLAN J R. Simplifying decision trees[J]. International Journal of Man-Machine Studies, 1987, 27(3):221-234.
[15] 王国胤. Rough集理论与知识获取[M]. 西安: 西安交通大学出版社, 2001.
[16] FAN W, STOLFO S J, ZHANG J X, et al. AdaCost: misclassification cost-sensitive boosting[C]//Proceeding of the 6th internatinal conference on machine learning. sanmateo: morgan kaufm ann publishers, 1999:97-105.
[17] SU C T, CHEN L S, YIH Y. Knowledge acquisition through information granulation for imbalanced data[J]. Expert Systems with Applications, 2006, 31(3):531-541.
[18] 赵凤英, 王崇骏, 陈世福. 用于不均衡数据集的挖掘方法[J]. 计算机科学, 2007, 34(9):139-141. ZHAO Fengying, WANG Chongjun, CHEN Shifu. Data mining on imbalanced data sets[J]. Computer Science, 2007, 34(9):139-141.
[19] 陈思, 郭躬德, 陈黎飞. 基于聚类融合的不平衡数据分类方法[J]. 模式识别与人工智能, 2010,23(6):772-780. CHEN Si, GUO Gongde,CHEN Lifei. Clustering ensembles based classification method for imbalanced data sets[J]. Pattem Recognition and Aitificial Intelligence, 2010, 23(6):772-780.
[20] WITTEN I H, FRANK E. Data Mining: Practical Machine Learning Tools and Techniques[M]. 2nd Edition. Orlando, USA: Morgan Kaufmann, 2005.
[1] YU Qingmin, LI Xiaolei, ZHAI Yong. Feature extraction method of rolling bearing inner ring in wind turbine based on improved EMD and feature box [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 89-95.
[2] PAN Pan1, WANG Xi-zhao2, ZHAI Jun-hai2. An improved induction algorithm based on ordinal decision tree [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(1): 41-44.
[3] XU Chunyao1,2, CHEN Mingzhi3*, YU Lun1. A proactive recommendation model  adapted to users′ changing requirements [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(3): 1-6.
[4] ZHANG Ling-wei, WAN Wen-qiang. Study on the cost-sensitive ensemble learning algorithm based on the cloud computing platform [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(4): 19-23.
[5] ZHANG Xiao-feng, ZHANG Zhi-wang, PANG Shan. Algorithm based on communication system for constructing decision tree [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 79-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei. Parallel implementing probabilistic spreading algorithm using MapReduce programming mode[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(5): 22 -28 .
[2] XIAO Qiao, PEI Jihong, WANG Lixia, GONG Zhicheng. Ship detection in remote sensing image based on the fuzzy fusion of multi-channel Gabor filtering[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(5): 29 -35 .
[3] LI Xinyu, XU Guiyun, REN Shijin, YANG Maoyun. Discriminative manifold-based uncorrelated sparse projective nonnegative matrix factorization[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(5): 1 -12 .
[4] WANG Xiaochu, WANG Shitong, BAO Fang. Image classification algorithm based on minimax probability machine with regularized probability density concensus[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(5): 13 -21 .