您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 10-16.doi: 10.6040/j.issn.1672-3961.0.2017.405

• • 上一篇    下一篇

基于人工蜂群和SVM的基因表达数据分类

叶明全,高凌云,万春圆   

  1. 皖南医学院健康大数据挖掘与应用研究中心, 安徽 芜湖 241002
  • 收稿日期:2017-05-09 出版日期:2018-06-20 发布日期:2017-05-09
  • 作者简介:叶明全(1973—),男,安徽当涂人,教授,博士,主要研究方向为数据挖掘与机器学习、生物医学信息学和健康医疗大数据等. E-mail:ymq@wnmc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61672386);安徽省自然科学基金资助项目(1708085MF142);教育部人文社会科学研究规划基金资助项目(16YJAZH071);安徽高校省级自然科学研究重点基金资助项目(KJ2014A266)

Gene expression data classification based on artificial bee colony and SVM

YE Mingquan, GAO Lingyun, WAN Chunyuan   

  1. Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu 241002, Anhui, China
  • Received:2017-05-09 Online:2018-06-20 Published:2017-05-09

摘要: 基因表达数据存在高维、小样本、高噪声等特性,使得相应的肿瘤分类诊断面临着一定的挑战。为了实现更加精确的分类准确率,利用人工蜂群(artificial bee colony, ABC)算法对支持向量机(support vector machine, SVM)的核函数参数和惩罚因子进行优化,采用准确率作为分类模型的适应度函数,提出一种基于ABC和SVM的基因表达数据分类方法ABC-SVM。在6种公开的肿瘤基因表达数据集上进行试验,并对比分析其他的分类方法。结果表明,在筛选得到的较少信息基因基础上,ABC-SVM可获得更高的肿瘤分类准确率,对肿瘤样本类型进行更有效的分类预测。

关键词: 人工蜂群, 支持向量机, 智能优化, 肿瘤分类, 生物信息学, 基因表达数据

Abstract: The characteristics of high dimension, small sample and high noise for gene expression data raised many challenges in tumor diagnosis. In order to classify tumor gene expression data more accurately, the kernel function parameters and penalty factors of SVM(support vector machine)were optimized by ABC(artificial bee colony)algorithm, in which classification accuracy was used as the fitness function. A new gene expression data classification method based on ABC algorithm and SVM, which named ABC-SVM, was proposed. Experiments were conducted on six public tumor gene expression datasets, and other classicfication methods were compared. The results showed that ABC-SVM, a method based on fewer informative genes, could obtain higher classification accuracy, and the classification of tumor samples could be more effectively predicted.

Key words: artificial bee colony, support vector machine, gene expression data, tumor classification, bioinformatics, intelligent optimization

中图分类号: 

  • TP391
[1] QUACKENBUSH J. Microarray analysis and tumor classification[J]. New England Journal of Medicine, 2006, 354(23): 2463-2472.
[2] 陆慧娟,安春霖,马小平,等. 基于输出不一致测度的极限学习机集成的基因表达数据分类[J]. 计算机学报, 2013, 36(2): 341-348. LU Huijuan, AN Chunlin, MA Xiaoping, et al. Disagreement measure based ensemble of extreme learning machine for gene expression data classification[J]. Chinese Journal of Computers, 2013, 36(2): 341-348.
[3] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42. LI Sushu, WANG Shitong, LI Tao. A feature selection method based on LS-SVM and fuzzy supplementary criterion[J]. Journal of Shandong University(Engineering Science), 2017, 47(3): 34-42.
[4] KAR S, SHARMA K D, MAITRA M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique[J]. Expert Systems with Applications, 2015, 42(1): 612-627.
[5] 谢娟英,谢维信. 基于特征子集区分度与支持向量机的特征选择算法[J]. 计算机学报, 2014, 37(8): 1704-1718. XIE Juanying, XIE Weixin. Several feature selection algorithms based on the discernibility of a feature subset and support vector machines[J]. Chinese Journal of Computers, 2014, 37(8): 1704-1718.
[6] 谢娟英,高红超. 基于统计相关性与K-means的区分基因子集选择算法[J]. 软件学报, 2014, 25(9): 2050-2075. XIE Juanying, GAO Hongchao. Statistical correlation and K-means based distinguishable gene subset selection algorithms[J]. Journal of Software, 2014, 25(9): 2050-2075.
[7] 叶明全,高凌云,伍长荣,等. 基于对称不确定性和SVM递归特征消除的信息基因选择方法[J]. 模式识别与人工智能, 2017, 30(5): 429-438. YE Mingquan, GAO Lingyun, WU Changrong, et al. Informative gene selection method based on symmetric uncertainty and SVM recursive feature elimination[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 429-438.
[8] GOLUB T R, SLONIM D K, TAMAYO P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439): 531-537.
[9] KHALILI M, MAJD H A, KHODAKARIM S, et al. Prediction of the thromboembolic syndrome: an application of artificial neural networks in gene expression data analysis[J]. Journal of Paramedical Sciences, 2016, 7(2): 15-22.
[10] GEORGE G V S, RAJ V C. Review on feature selection techniques and the impact of SVM for cancer classification using gene expression profile[J]. International Journal of Computer Science & Engineering Survey, 2011, 2(3): 16-27.
[11] CORTES C, VAPNIK V. Support vector networks[J]. Machine Learning, 1995, 20(3):273-297.
[12] 林俊,许露,刘龙. 基于SVM-RFE-BPSO算法的特征选择方法[J]. 小型微型计算机系统, 2015, 36(8):1865-1868. LIN Jun, XU Lu, LIU Long. Feature selection based on SVM-RFE and particle swarm optimization[J]. Journal of Chinese Computer Systems, 2015, 36(8):1865-1868.
[13] HUANG Chenglung, WANG Chiehjen. A GA-based feature selection and parameters optimization for support vector machines[J]. Expert Systems with Applications, 2006, 31(2):231-240.
[14] ZHANG Xiaoli, CHEN Xuefeng, HE Zhengjia, et al. An ACO-based algorithm for parameter optimization of support vector machines[J]. Expert Systems With Applications, 2010, 37(9): 6618-6628.
[15] SUBASI A. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders[J]. Computers in Biology & Medicine, 2013, 43(5): 576-586.
[16] BAO Yukun, HU Zhongyi, XIONG Tao. A PSO and pattern search based memetic algorithm for SVMs parameters optimization[J]. Neurocomputing, 2014, 117(1): 98-106.
[17] ZHANG Qiantu, FANG Liqing, MA Leilei, et al. Research on parameters optimization of SVM based on improved fruit fly optimization algorithm[J]. International Journal of Computer Theory and Engineering, 2016, 8(6): 500-505.
[18] KARABOGA D, BASTURK B. On the performance of artificial bee colony(ABC)algorithm[J]. Applied Soft Computing, 2008, 8(1): 687-697.
[19] KARABOGA D, GORKKEMLI B, OZTURK C, et al. A comprehensive survey: artificial bee colony(ABC)algorithm and applications[J]. Artificial Intelligence Review, 2014, 42(1): 21-57.
[20] KIRAN M S, BABALIK A. Improved artificial bee colony algorithm for continuous optimization problems[J]. Journal of Computer & Communications, 2014, 02(4): 108-116.
[21] SECUI D C. A new modified artificial bee colony algorithm for the economic dispatch problem[J]. Energy Conversion & Management, 2015, 89(89): 43-62.
[22] 秦全德,程适,李丽,等.人工蜂群算法研究综述[J]. 智能系统学报,2014,9(2):127-135. QIN Quande, CHENG Shi, LI Li, et al. Artificial bee colony algorithm: a survey[J]. CAAI Transactions on Intelligent Systems, 2014, 9(2): 127-135.
[23] KARABOGA D, AKAY B. A comparative study of artificial bee colony algorithm[J]. Applied Mathematics & Computation, 2009, 214(1): 108-132.
[24] TSAI H C. Integrating the artificial bee colony and bees algorithm to face constrained optimization problems[J]. Information Sciences, 2014, 258(3): 80-93.
[25] SATHYANARAYANA S V, AMARAPPA S. Data classification using support vector machine(SVM), a simplified approach[J]. International Journal of Electronics & Computer Science Engineering, 2014, 3(4): 435-445.
[26] 刘岩,李幼军,陈萌. 基于EMD和SVM的抑郁症静息态脑电信号分类研究[J]. 山东大学学报(工学版), 2017, 47(3): 21-26. LIU Yan, LI Youjun, CHEN Meng. Research on the classification of resting state EEG signal between depression patients and normal controls by EMD and SVM methods[J]. Journal of Shandong University(Engineering Science), 2017, 47(3): 21-26.
[27] LI Meng, YI Liangzhong, GAO Zhisheng, et al. Support vector machine(SVM)based on membrane computing optimization and the application for C-band radio abnormal signal identification[J]. Journal of Information & Computational Science, 2014, 11(11): 3683-3693.
[28] 李颖新,阮晓钢. 基于支持向量机的肿瘤分类特征基因选取[J]. 计算机研究与发展, 2005, 42(10): 1796-1801. LI Yingxin, RUAN Xiaogang. Feature selection for cancer classification based on Support Vector Machine[J]. Journal of Computer Research and Development, 2005, 42(10): 1796-1801.
[29] YU Lei, LIU Huan. Efficient feature selection via analysis of relevance and redundancy[J]. Journal of Machine Learning Research, 2004, 5(12): 1205-1224.
[30] CHANG Chihchung, LIN Chihjen. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems & Technology, 2011, 2(3): 1-39.
[31] 梁兴建,詹志辉. 基于双模式变异策略的改进遗传算法[J]. 山东大学学报(工学版), 2014, 44(6): 1-7. LIANG Xingjian, ZHAN Zhihui. Improved genetic algorithm based on the dual-mode mutation strategy[J]. Journal of Shandong University(Engineering Science), 2014, 44(6): 1-7.
[32] LIU Yihui. Cancer identification based on DNA microarray data[C] //Processdings of the International Conference on Emerging Technologies in Knowledge Discovery and Data Mining. Nanjing, China: Springer-Verlag, 2007:153-161.
[33] ZHANG Shanwen, HUANG Deshuang, WANG Shulin. A method of tumor classification based on wavelet packet transforms and neighborhood rough set[J]. Computers in Biology & Medicine, 2010, 40(4): 430-437.
[1] 亓晓燕,刘恒杰,侯秋华,刘啸宇,谭延超,王连成. 融合LSTM和SVM的钢铁企业电力负荷短期预测[J]. 山东大学学报 (工学版), 2021, 51(4): 91-98.
[2] 宗欣露,杜佳圆. 基于多目标驱动人工蜂群算法的疏散仿真模型[J]. 山东大学学报 (工学版), 2021, 51(3): 1-6.
[3] 马昕,王雪. 基于Laplacian支持向量机和序列信息的microRNA-结合残基预测[J]. 山东大学学报 (工学版), 2020, 50(2): 76-82.
[4] 梁志祥,刘晓明,牟颖,刘玉田. 基于深度学习的新能源爬坡事件预测方法[J]. 山东大学学报 (工学版), 2019, 49(5): 24-28.
[5] 严云洋,张慧珍,刘以安,高尚兵. 基于GMM与三维LBP纹理的视频火焰检测[J]. 山东大学学报 (工学版), 2019, 49(1): 1-9.
[6] 李兴,侯振杰,梁久祯,常兴治. 基于线性加速度的多节点人体行为识别[J]. 山东大学学报 (工学版), 2018, 48(6): 56-66.
[7] 韩学山,王俊雄,孙东磊,李文博,张心怡,韦志清. 计及空间关联冗余的节点负荷预测方法[J]. 山东大学学报(工学版), 2017, 47(6): 7-12.
[8] 刘岩,李幼军,陈萌. 基于EMD和SVM的抑郁症静息态脑电信号分类研究[J]. 山东大学学报(工学版), 2017, 47(3): 21-26.
[9] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42.
[10] 刘杰, 杨鹏, 吕文生, 刘阿古达木, 刘俊秀. 基于气象因素的PM2.5质量浓度预测模型[J]. 山东大学学报(工学版), 2015, 45(6): 76-83.
[11] 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26.
[12] 浩庆波, 牟少敏, 尹传环, 昌腾腾, 崔文斌. 一种基于聚类的快速局部支持向量机算法[J]. 山东大学学报(工学版), 2015, 45(1): 13-18.
[13] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6): 70-76.
[14] 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33.
[15] 王昊,华继学,范晓诗. 基于双联支持向量机的入侵检测技术[J]. 山东大学学报(工学版), 2013, 43(6): 53-56.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王素玉,艾兴,赵军,李作丽,刘增文 . 高速立铣3Cr2Mo模具钢切削力建模及预测[J]. 山东大学学报(工学版), 2006, 36(1): 1 -5 .
[2] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[3] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[4] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[5] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[6] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[7] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[8] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[9] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[10] 浦剑1 ,张军平1 ,黄华2 . 超分辨率算法研究综述[J]. 山东大学学报(工学版), 2009, 39(1): 27 -32 .