您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 10-16.doi: 10.6040/j.issn.1672-3961.0.2017.405

• • 上一篇    下一篇

基于人工蜂群和SVM的基因表达数据分类

叶明全,高凌云,万春圆   

  1. 皖南医学院健康大数据挖掘与应用研究中心, 安徽 芜湖 241002
  • 收稿日期:2017-05-09 出版日期:2018-06-20 发布日期:2017-05-09
  • 作者简介:叶明全(1973—),男,安徽当涂人,教授,博士,主要研究方向为数据挖掘与机器学习、生物医学信息学和健康医疗大数据等. E-mail:ymq@wnmc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61672386);安徽省自然科学基金资助项目(1708085MF142);教育部人文社会科学研究规划基金资助项目(16YJAZH071);安徽高校省级自然科学研究重点基金资助项目(KJ2014A266)

Gene expression data classification based on artificial bee colony and SVM

YE Mingquan, GAO Lingyun, WAN Chunyuan   

  1. Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu 241002, Anhui, China
  • Received:2017-05-09 Online:2018-06-20 Published:2017-05-09

摘要: 基因表达数据存在高维、小样本、高噪声等特性,使得相应的肿瘤分类诊断面临着一定的挑战。为了实现更加精确的分类准确率,利用人工蜂群(artificial bee colony, ABC)算法对支持向量机(support vector machine, SVM)的核函数参数和惩罚因子进行优化,采用准确率作为分类模型的适应度函数,提出一种基于ABC和SVM的基因表达数据分类方法ABC-SVM。在6种公开的肿瘤基因表达数据集上进行试验,并对比分析其他的分类方法。结果表明,在筛选得到的较少信息基因基础上,ABC-SVM可获得更高的肿瘤分类准确率,对肿瘤样本类型进行更有效的分类预测。

关键词: 人工蜂群, 支持向量机, 智能优化, 肿瘤分类, 生物信息学, 基因表达数据

Abstract: The characteristics of high dimension, small sample and high noise for gene expression data raised many challenges in tumor diagnosis. In order to classify tumor gene expression data more accurately, the kernel function parameters and penalty factors of SVM(support vector machine)were optimized by ABC(artificial bee colony)algorithm, in which classification accuracy was used as the fitness function. A new gene expression data classification method based on ABC algorithm and SVM, which named ABC-SVM, was proposed. Experiments were conducted on six public tumor gene expression datasets, and other classicfication methods were compared. The results showed that ABC-SVM, a method based on fewer informative genes, could obtain higher classification accuracy, and the classification of tumor samples could be more effectively predicted.

Key words: artificial bee colony, support vector machine, gene expression data, tumor classification, bioinformatics, intelligent optimization

中图分类号: 

  • TP391
[1] QUACKENBUSH J. Microarray analysis and tumor classification[J]. New England Journal of Medicine, 2006, 354(23): 2463-2472.
[2] 陆慧娟,安春霖,马小平,等. 基于输出不一致测度的极限学习机集成的基因表达数据分类[J]. 计算机学报, 2013, 36(2): 341-348. LU Huijuan, AN Chunlin, MA Xiaoping, et al. Disagreement measure based ensemble of extreme learning machine for gene expression data classification[J]. Chinese Journal of Computers, 2013, 36(2): 341-348.
[3] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42. LI Sushu, WANG Shitong, LI Tao. A feature selection method based on LS-SVM and fuzzy supplementary criterion[J]. Journal of Shandong University(Engineering Science), 2017, 47(3): 34-42.
[4] KAR S, SHARMA K D, MAITRA M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique[J]. Expert Systems with Applications, 2015, 42(1): 612-627.
[5] 谢娟英,谢维信. 基于特征子集区分度与支持向量机的特征选择算法[J]. 计算机学报, 2014, 37(8): 1704-1718. XIE Juanying, XIE Weixin. Several feature selection algorithms based on the discernibility of a feature subset and support vector machines[J]. Chinese Journal of Computers, 2014, 37(8): 1704-1718.
[6] 谢娟英,高红超. 基于统计相关性与K-means的区分基因子集选择算法[J]. 软件学报, 2014, 25(9): 2050-2075. XIE Juanying, GAO Hongchao. Statistical correlation and K-means based distinguishable gene subset selection algorithms[J]. Journal of Software, 2014, 25(9): 2050-2075.
[7] 叶明全,高凌云,伍长荣,等. 基于对称不确定性和SVM递归特征消除的信息基因选择方法[J]. 模式识别与人工智能, 2017, 30(5): 429-438. YE Mingquan, GAO Lingyun, WU Changrong, et al. Informative gene selection method based on symmetric uncertainty and SVM recursive feature elimination[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 429-438.
[8] GOLUB T R, SLONIM D K, TAMAYO P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439): 531-537.
[9] KHALILI M, MAJD H A, KHODAKARIM S, et al. Prediction of the thromboembolic syndrome: an application of artificial neural networks in gene expression data analysis[J]. Journal of Paramedical Sciences, 2016, 7(2): 15-22.
[10] GEORGE G V S, RAJ V C. Review on feature selection techniques and the impact of SVM for cancer classification using gene expression profile[J]. International Journal of Computer Science & Engineering Survey, 2011, 2(3): 16-27.
[11] CORTES C, VAPNIK V. Support vector networks[J]. Machine Learning, 1995, 20(3):273-297.
[12] 林俊,许露,刘龙. 基于SVM-RFE-BPSO算法的特征选择方法[J]. 小型微型计算机系统, 2015, 36(8):1865-1868. LIN Jun, XU Lu, LIU Long. Feature selection based on SVM-RFE and particle swarm optimization[J]. Journal of Chinese Computer Systems, 2015, 36(8):1865-1868.
[13] HUANG Chenglung, WANG Chiehjen. A GA-based feature selection and parameters optimization for support vector machines[J]. Expert Systems with Applications, 2006, 31(2):231-240.
[14] ZHANG Xiaoli, CHEN Xuefeng, HE Zhengjia, et al. An ACO-based algorithm for parameter optimization of support vector machines[J]. Expert Systems With Applications, 2010, 37(9): 6618-6628.
[15] SUBASI A. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders[J]. Computers in Biology & Medicine, 2013, 43(5): 576-586.
[16] BAO Yukun, HU Zhongyi, XIONG Tao. A PSO and pattern search based memetic algorithm for SVMs parameters optimization[J]. Neurocomputing, 2014, 117(1): 98-106.
[17] ZHANG Qiantu, FANG Liqing, MA Leilei, et al. Research on parameters optimization of SVM based on improved fruit fly optimization algorithm[J]. International Journal of Computer Theory and Engineering, 2016, 8(6): 500-505.
[18] KARABOGA D, BASTURK B. On the performance of artificial bee colony(ABC)algorithm[J]. Applied Soft Computing, 2008, 8(1): 687-697.
[19] KARABOGA D, GORKKEMLI B, OZTURK C, et al. A comprehensive survey: artificial bee colony(ABC)algorithm and applications[J]. Artificial Intelligence Review, 2014, 42(1): 21-57.
[20] KIRAN M S, BABALIK A. Improved artificial bee colony algorithm for continuous optimization problems[J]. Journal of Computer & Communications, 2014, 02(4): 108-116.
[21] SECUI D C. A new modified artificial bee colony algorithm for the economic dispatch problem[J]. Energy Conversion & Management, 2015, 89(89): 43-62.
[22] 秦全德,程适,李丽,等.人工蜂群算法研究综述[J]. 智能系统学报,2014,9(2):127-135. QIN Quande, CHENG Shi, LI Li, et al. Artificial bee colony algorithm: a survey[J]. CAAI Transactions on Intelligent Systems, 2014, 9(2): 127-135.
[23] KARABOGA D, AKAY B. A comparative study of artificial bee colony algorithm[J]. Applied Mathematics & Computation, 2009, 214(1): 108-132.
[24] TSAI H C. Integrating the artificial bee colony and bees algorithm to face constrained optimization problems[J]. Information Sciences, 2014, 258(3): 80-93.
[25] SATHYANARAYANA S V, AMARAPPA S. Data classification using support vector machine(SVM), a simplified approach[J]. International Journal of Electronics & Computer Science Engineering, 2014, 3(4): 435-445.
[26] 刘岩,李幼军,陈萌. 基于EMD和SVM的抑郁症静息态脑电信号分类研究[J]. 山东大学学报(工学版), 2017, 47(3): 21-26. LIU Yan, LI Youjun, CHEN Meng. Research on the classification of resting state EEG signal between depression patients and normal controls by EMD and SVM methods[J]. Journal of Shandong University(Engineering Science), 2017, 47(3): 21-26.
[27] LI Meng, YI Liangzhong, GAO Zhisheng, et al. Support vector machine(SVM)based on membrane computing optimization and the application for C-band radio abnormal signal identification[J]. Journal of Information & Computational Science, 2014, 11(11): 3683-3693.
[28] 李颖新,阮晓钢. 基于支持向量机的肿瘤分类特征基因选取[J]. 计算机研究与发展, 2005, 42(10): 1796-1801. LI Yingxin, RUAN Xiaogang. Feature selection for cancer classification based on Support Vector Machine[J]. Journal of Computer Research and Development, 2005, 42(10): 1796-1801.
[29] YU Lei, LIU Huan. Efficient feature selection via analysis of relevance and redundancy[J]. Journal of Machine Learning Research, 2004, 5(12): 1205-1224.
[30] CHANG Chihchung, LIN Chihjen. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems & Technology, 2011, 2(3): 1-39.
[31] 梁兴建,詹志辉. 基于双模式变异策略的改进遗传算法[J]. 山东大学学报(工学版), 2014, 44(6): 1-7. LIANG Xingjian, ZHAN Zhihui. Improved genetic algorithm based on the dual-mode mutation strategy[J]. Journal of Shandong University(Engineering Science), 2014, 44(6): 1-7.
[32] LIU Yihui. Cancer identification based on DNA microarray data[C] //Processdings of the International Conference on Emerging Technologies in Knowledge Discovery and Data Mining. Nanjing, China: Springer-Verlag, 2007:153-161.
[33] ZHANG Shanwen, HUANG Deshuang, WANG Shulin. A method of tumor classification based on wavelet packet transforms and neighborhood rough set[J]. Computers in Biology & Medicine, 2010, 40(4): 430-437.
[1] 韩学山,王俊雄,孙东磊,李文博,张心怡,韦志清. 计及空间关联冗余的节点负荷预测方法[J]. 山东大学学报(工学版), 2017, 47(6): 7-12.
[2] 刘岩,李幼军,陈萌. 基于EMD和SVM的抑郁症静息态脑电信号分类研究[J]. 山东大学学报(工学版), 2017, 47(3): 21-26.
[3] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42.
[4] 刘杰, 杨鹏, 吕文生, 刘阿古达木, 刘俊秀. 基于气象因素的PM2.5质量浓度预测模型[J]. 山东大学学报(工学版), 2015, 45(6): 76-83.
[5] 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26.
[6] 浩庆波, 牟少敏, 尹传环, 昌腾腾, 崔文斌. 一种基于聚类的快速局部支持向量机算法[J]. 山东大学学报(工学版), 2015, 45(1): 13-18.
[7] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6): 70-76.
[8] 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33.
[9] 王昊,华继学,范晓诗. 基于双联支持向量机的入侵检测技术[J]. 山东大学学报(工学版), 2013, 43(6): 53-56.
[10] 安春霖1,陆慧娟1,2*,郑恩辉3,王明怡1,陆羿4. 嵌入误分类代价和拒识代价的极限学习机基因表达数据分类[J]. 山东大学学报(工学版), 2013, 43(4): 18-25.
[11] 施珺,朱敏. 一种基于灰色系统和支持向量机的预测优化模型[J]. 山东大学学报(工学版), 2012, 42(5): 7-11.
[12] 赵加敏,冯爱民*,刘学军. 局部密度嵌入的结构单类支持向量机[J]. 山东大学学报(工学版), 2012, 42(4): 13-18.
[13] 潘冬寅,朱发,徐昇,业宁*. 结肠癌基因表达谱的特征选取研究[J]. 山东大学学报(工学版), 2012, 42(2): 23-29.
[14] 孙鹏,程世庆*,谢敬思,张海瑞. 预测混合生物质灰熔点的CV-GA-SVM模型[J]. 山东大学学报(工学版), 2012, 42(2): 108-111.
[15] 赵燕燕, 范丽亚. 多输出支持向量回归机在依赖时间的变分不等式中的应用[J]. 山东大学学报(工学版), 2011, 41(3): 23-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!