您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2011, Vol. 41 ›› Issue (4): 91-94.

• 论文 • 上一篇    下一篇

基于AdaBoost的欠抽样集成学习算法

孙晓燕1,2,张化祥1,2*,计华1,2   

  1. 1. 山东师范大学信息科学与工程学院, 山东 济南  250014;
    2. 山东省分布式计算机软件新技术重点实验室, 山东 济南 250014
  • 收稿日期:2010-04-05 出版日期:2011-08-16 发布日期:2010-04-05
  • 通讯作者: 张化祥(1966- ),男,山东济宁人,博士生导师,主要研究方向为模式识别、进化计算等.Email:huaxzhang@163.com E-mail:huaxzhang@163.com
  • 作者简介:孙晓燕(1987- ),女,山东济南人,硕士研究生,主要研究方向为数据挖掘.E-mail:xiaomeixi-1987@163.com
  • 基金资助:

    山东省科技研究计划项目(2008B0026, ZR2010FM021, 2010G0020115)

An under-sampling approach based on AdaBoost for ensembled classification

SUN Xiao-yan1,2, ZHANG Hua-xiang1,2*, JI Hua1,2   

  1. 1. Department of Information Science and Engineering, Shandong Normal University, Jinan 250014, China;
    2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan 250014, China
  • Received:2010-04-05 Online:2011-08-16 Published:2010-04-05

摘要:

不平衡数据集分类中,采用欠抽样方法容易忽略多数类中部分有用信息,为此提出一种基于AdaBoost的欠抽样集成学习算法U-Ensemble。该方法首先使用AdaBoost算法对数据集预处理,得到各样例权重。训练基分类器时,针对多数类数据不再采用bootstrap抽样方法,而是分别随机选择部分权重较大的样例与部分权重较小的样例,使两部分样例个数与少数类样例个数相同,并组成Bagging成员分类器的训练数据。实验结果证明了算法的有效性。

关键词: 不平衡数据集;AdaBoost算法, 欠抽样

Abstract:

 Under-sampling was easy to ignore some useful information of the majority class in imbalanced data sets classification. So, we propose an AdaBoost-based under-sampling ensemble approach U-Ensemble to solve this problem. Firstly, AdaBoost was used to process the imbalanced data sets in order to get the weights of samples. Then, we used Bagging as the classifier, bootstrap was no longer used when sampled the majority class, but we randomly selected some of samples that had larger and smaller weights.Meanwhile, we ensured that the number of the samples selected from the majority class were equal to the number of the minority class. At last, we combined the sampled majority class samples and all the minority class samples as the training data set for a component classifier. Experimental results showed the effectiveness of U-Ensemble.

Key words:  imbalanced data sets, AdaBoost, Under-sampling

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!