Journal of Shandong University(Engineering Science) ›› 2015, Vol. 45 ›› Issue (5): 36-42.doi: 10.6040/j.issn.1672-3961.2.2015.190

Previous Articles     Next Articles

Hierarchical cost sensitive decision tree and its application in the prediction of the mobile phone replacement

XIONG Bingyan, WANG Guoyin, DENG Weibin   

  1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Published:2020-05-26

Abstract: In the data of mobile phone users, imbalance problem existed between the replacement users and non replacement users, however traditional date mining pursued the best overall accuracy which led the prediction accuracy of the replacement users overly low. In order to solve this problem, a method of predicting the users who replace phone was proposed based on hierarchical cost sensitive decision tree. The algorithm realized attributes reduction and calculated the importance of attributes by rough set, then a hierarchical structure was built by parting the attributes; finally a cost sensitive decision tree was regarded as the base classifier for the hierarchical structure, the decision tree was constructed with its splitting criterion which included gini index and misclassification cost. Three experiments were made for the users data which from a telecom operator, the results showed that the hierarchical cost sensitive decision tree achieved a better effect on the imbalance user data and balance user data which obtained by under sampling.

Key words: hierarchical structure, decision tree, cost sensitive, imbalance data, prediction of replacing phone

CLC Number: 

  • TP391
[1] BATISTA G E, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6(1):20-29.
[2] KOTSIANTIS S B, PINTELAS P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1):46-55.
[3] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[4] HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[J]. Computer Science, 2005, 3644:878-887.
[5] GARCIA S, HERRERA F. Evolutionary under sampling for classification with imbalanced data sets: proposals and taxonomy[J]. Evolutionary Computation, 2009, 17(3):275-306.
[6] YEN S J, LEE Y S. Cluster-based under-sampling approaches for imbalanced data distributions[J]. Expert Systems with Applications, 2009, 36(3):5718-5727.
[7] WU J, XIONG H, WU P, et al. Local decomposition for rare class analysis[J]. Kdd, 2007, 20(2):191-220.
[8] BLASZCZYNSKI J, STEFANOWSKI J. Neighbourhood sampling in bagging for imbalanced data[J]. Neurocomputing, 2015, 150:529-542.
[9] KAI M T. An instance-weighting method to induce cost-sensitive trees[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3):659-665.
[10] ZHANG S. Decision tree classifiers sensitive to heterogeneous costs[J]. Journal of Systems and Software, 2012, 85(4):771-779.
[11] 郑燕, 王杨, 郝青峰, 等. 用于不平衡数据分类的代价敏感超网络算法[J]. 计算机应用, 2014, 34(5):1336-1340. ZHENG Yan, WANG Yang, HAO Qinfeng, et al. Cost-sensitive hypernetworks for imbalanced data classification[J]. Journal of Computer Applications, 2014, 34(5):1336-1340.
[12] PARK Y, LUO L, PARHI K K, et al. Seizure prediction with spectral power of EEG using cost-sensitive support vector machines[J]. Epilepsia, 2011, 52(10):1761-1770.
[13] BREIMAN L, FRIEDMAN J, STONE C J, et al. Classification and regression trees[M]. Boca Raton: CRC press, 1984.
[14] QUINLAN J R. Simplifying decision trees[J]. International Journal of Man-Machine Studies, 1987, 27(3):221-234.
[15] 王国胤. Rough集理论与知识获取[M]. 西安: 西安交通大学出版社, 2001.
[16] FAN W, STOLFO S J, ZHANG J X, et al. AdaCost: misclassification cost-sensitive boosting[C] // Proceeding of the 6th internatinal conference on machine learning. sanmateo: morgan kaufm ann publishers, 1999:97-105.
[17] SU C T, CHEN L S, YIH Y. Knowledge acquisition through information granulation for imbalanced data[J]. Expert Systems with Applications, 2006, 31(3):531-541.
[18] 赵凤英, 王崇骏, 陈世福. 用于不均衡数据集的挖掘方法[J]. 计算机科学, 2007, 34(9):139-141. ZHAO Fengying, WANG Chongjun, CHEN Shifu. Data mining on imbalanced data sets[J]. Computer Science, 2007, 34(9):139-141.
[19] 陈思, 郭躬德, 陈黎飞. 基于聚类融合的不平衡数据分类方法[J]. 模式识别与人工智能, 2010,23(6):772-780. CHEN Si, GUO Gongde,CHEN Lifei. Clustering ensembles based classification method for imbalanced data sets[J]. Pattem Recognition and Aitificial Intelligence, 2010, 23(6):772-780.
[20] WITTEN I H, FRANK E. Data Mining: Practical Machine Learning Tools and Techniques[M]. 2nd Edition. Orlando, USA: Morgan Kaufmann, 2005.
[1] Bo ZHANG,Feng LU,Hanyu DONG,Qingtai CHEN,Zhenzhi LIN,Hongtao WANG. None-consumption users filtering algorithm based on decision tree and data-driven methods [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 29-36.
[2] YU Qingmin, LI Xiaolei, ZHAI Yong. Feature extraction method of rolling bearing inner ring in wind turbine based on improved EMD and feature box [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 89-95.
[3] PAN Pan1, WANG Xi-zhao2, ZHAI Jun-hai2. An improved induction algorithm based on ordinal decision tree [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(1): 41-44.
[4] XU Chunyao1,2, CHEN Mingzhi3*, YU Lun1. A proactive recommendation model  adapted to users′ changing requirements [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(3): 1-6.
[5] ZHANG Ling-wei, WAN Wen-qiang. Study on the cost-sensitive ensemble learning algorithm based on the cloud computing platform [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(4): 19-23.
[6] ZHANG Xiao-feng, ZHANG Zhi-wang, PANG Shan. Algorithm based on communication system for constructing decision tree [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 79-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Yong-hua,WANG An-ling,LIU Fu-ping . The reflected phase angle of low frequent inhomogeneous[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 22 -25 .
[2] KONG Xiang-zhen,LIU Yan-jun,WANG Yong,ZHAO Xiu-hua . Compensation and simulation for the deadband of the pneumatic proportional valve[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 99 -102 .
[3] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[4] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[5] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .
[6] QIN Tong, SUN Fengrong*, WANG Limei, WANG Qinghao, LI Xincai. 3D surface reconstruction using the shape based interpolation guided by maximal discs[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 1 -5 .
[7] LIU Wen-liang, ZHU Wei-hong, CHEN Di, ZHANG Hong-quan. Detection and tracking of moving targets using the morphology match in radar images[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 31 -36 .
[8] SUN Dianzhu, ZHU Changzhi, LI Yanrui. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 84 -86 .
[9] XIA Bin,ZHANG Lian-jun . Energy comparison-based TOA estimation algorithm for the DS-CDMA UWB system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 70 -73 .
[10] HU Tian-liang,LI Peng,ZHANG Cheng-rui,ZUO Yi . Design of a QEP decode counter based on VHDL[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 10 -13 .