您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 76-82.doi: 10.6040/j.issn.1672-3961.0.2019.292

• 机器学习与数据挖掘 • 上一篇    下一篇

基于Laplacian支持向量机和序列信息的microRNA-结合残基预测

马昕1(),王雪2   

  1. 1. 南京审计大学统计与数学学院, 江苏 南京 211815
    2. 南京审计大学实验中心, 江苏 南京 211815
  • 收稿日期:2019-06-06 出版日期:2020-04-20 发布日期:2020-04-16
  • 作者简介:马昕(1982—),女,江苏镇江人,副教授,博士,硕士生导师,主要研究方向为生物信息学,机器学习. E-mail:maxin@nau.edu.cn

Prediction of microRNA-binding residues based on Laplacian support vector machine and sequence information

Xin MA1(),Xue WANG2   

  1. 1. School of Statistics and Mathematics, Nanjing Audit University, Nanjing 211815, Jiangsu, China
    2. Experimental Center, Nanjing Audit University, Nanjing 211815, Jiangsu, China
  • Received:2019-06-06 Online:2020-04-20 Published:2020-04-16

摘要:

提出一种半监督学习算法预测蛋白质序列中microRNA-结合残基的新式的方法。通过Laplacian支持向量机(Laplacian support vector machine,LapSVM)算法结合新提出的混合特征构建预测模型。混合特征是由三类信息组合获得:二级结构信息、HKM特征和新提出的氨基酸理化特性和进化信息结合的特征。比较各种特征的预测性能,新提出的这一特征对预测性能的提高贡献最大。结果表明,通过特征选择,本研究构建的预测模型准确性达到88.72%,敏感性达到54.18%,特异性达到91.15%,明显优于其他方法。

关键词: microRNA-结合残基, Laplacian支持向量机, 进化信息, 物理化学特征, 特征筛选

Abstract:

A new method of semi-surpervised learning algorithm was proposed to predict miRNA-binding residues in protein sequences. The Laplacian support vector machine (LapSVM) algorithm was combined with the newly proposed hybrid features to build a prediction model. The hybrid features were obtained from a combination of secondary structure information, HKM features, and the newly proposed feature combination of amino acid physicochemical properties and evolutionary information. Performance comparison of the various features indicated that our novel feature contributed the most to prediction improvement. The results demonstrated that accuracy of our LapSVM model achieved 88.72%, sensitivity achieved 54.18% and specificity achieved 91.15% using feature selection. The LapSVM model significantly outperformed other approaches at miRNA-binding site prediction.

Key words: microRNA-binding residues, Laplacian support vector machine, evolutionary information, physicochemical properties, feature selection

中图分类号: 

  • Q811.4

表1

PDB和UniProt构成的Main数据集"

数据库 蛋白质序列ID号
PDB 2LI8_A,2N82_B,3A6P_A,3A6P_C,3ADI_A,3ADL_A,3TRZ_A,4L8R_C,4NGB_A,4QOZ_B,4W5N_A
UniProt O04379,O04492,P09651,P43243,P48432,P98175,Q01860,Q06787,Q1PRL4,Q2VB19,Q3UHX9,Q4R979,Q5D1E8,Q5RCW2,Q6GLC9,Q80U58,Q8CJF8,Q8K3Y3,Q8R205,Q8R418,Q9JIK5,Q9R0B7,Q9SKN5,Q9U489,Q9UGR2,Q9XGW1,Q9ZVD5

表2

基于不同的特征构建的LapSVM模型的预测性能"

特征 准确率 敏感性 特异性 MCC
PSSM 0.711 2 0.215 7 0.748 2 0.021
PSSMPP 0.782 0 0.333 3 0.815 5 0.096
PSSMPP+SS 0.836 5 0.451 0 0.865 3 0.221
PSSMPP+HKM 0.831 6 0.414 2 0.861 1 0.188
PSSMPP+SS+HKM 0.875 2 0.493 3 0.903 5 0.291
Optimal 44 features 0.887 2 0.541 8 0.911 5 0.334

图1

165个特征子集构建的模型的MCC曲线"

表3

不同的机器学习算法的预测性能比较"

特征 准确率 敏感性 特异性 MCC
RF 0.739 8 0.352 9 0.768 7 0.072
SVM 0.410 1 0.137 3 0.430 5 0.000
LapSVM 0.887 2 0.541 8 0.911 5 0.334
1 AGATA F . MiRNA: new mechanisms of gene expression control[J]. Postepy Biochemii, 2007, 53 (4): 413- 419.
2 MALHAS A , SAUNDERS N J , VAUX D J . The nuclear envelope can control gene expression and cell cycle progression via miRNA regulation[J]. Cell Cycle, 2010, 9 (3): 531- 539.
doi: 10.4161/cc.9.3.10511
3 BARTEL D P . MicroRNAs target recognition and regulatory functions[J]. Cell, 2009, 136 (2): 215- 233.
4 CUSHING L , JIANG Z , KUANG P , et al. The roles of microRNAs and protein components of the microRNA pathway in lung development and diseases[J]. American Journal of Respiratory Cell and Molecular Biology, 2015, 52 (4): 397- 408.
doi: 10.1165/rcmb.2014-0232RT
5 DAI R , AHMED S A . MicroRNA, a new paradigm for understanding immunoregulation, inflammation, and autoimmune diseases[J]. Translational Research: the Journal of Laboratory and Clinical Medicine, 2011, 157 (4): 163- 179.
doi: 10.1016/j.trsl.2011.01.007
6 LEI W , LI G , ZHENG J , SHUI X , et al. Roles of microRNA in vascular diseases in cardiac and pulmonary systems[J]. Die Pharmazie, 2014, 69 (9): 643- 647.
7 LU T X , ROTHENBERG M E . Diagnostic, functional, and therapeutic roles of microRNA in allergic diseases[J]. The Journal of Allergy and Clinical Immunology, 2013, 132 (1): 3- 13.
doi: 10.1016/j.jaci.2013.04.039
8 WAHID F , KHAN T , KIM Y Y . MicroRNA and diseases: therapeutic potential as new generation of drugs[J]. Biochimie, 2014, 104, 12- 26.
doi: 10.1016/j.biochi.2014.05.004
9 WU J S , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinfor-matic, 2013, 10 (3): 752- 759.
doi: 10.1109/TCBB.2013.75
10 BELKIN M , NIYOGI P , SINDHWANI V . Manifold regularization: a geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research, 2006, 7, 2399- 2434.
11 BERMAN H M , WESTBROOK J , FENG Z , et al. The protein data bank[J]. Nucleic Acids Research, 2000, 28 (1): 235- 242.
12 UNIPROT C . UniProt: a hub for protein information[J]. Nucleic Acids Research, 2015, 43, D204- 212.
doi: 10.1093/nar/gku989
13 ALTSCHUL S F , MADDEN T L , SCHAFFER A A , et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25 (17): 3389- 3402.
doi: 10.1093/nar/25.17.3389
14 CHEN Y C , LIM C . Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry[J]. Nucleic Acids Research, 2008, 36 (5): e29.
doi: 10.1093/nar/gkn008
15 CHENG C W , SU E C , HWANG J K , et al. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information[J]. BMC Bioinformatics, 2008, 12 (Suppl.9): 6.
16 TONG J , JIANG P , LU Z H . RISP: a web-based server for prediction of RNA-binding sites in proteins[J]. Computer Methods and Programs in Biomedicine, 2008, 90 (2): 148- 153.
doi: 10.1016/j.cmpb.2007.12.003
17 MA X , GUO J , WU J S , et al. Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature[J]. Proteins, 2011, 79 (4): 1230- 1239.
18 WU J Z , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2013, 10 (3): 752- 759.
19 MA X , GUO J , LIU HD , et al. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2012, 9 (6): 1766- 1775.
doi: 10.1109/TCBB.2012.106
20 HUANG Y F , HUANG C C , LIU Y C , et al. DNA-binding residues and binding mode prediction with binding-mechanism concerned models[J]. BMC Genomics, 2009, 10 (Suppl.1): 3- 23.
21 ROHS R , WEST S M , SOSINSKY A , et al. The role of DNA shape in protein-DNA recognition[J]. Nature, 2009, 461 (7268): 1248- 1253.
doi: 10.1038/nature08473
22 WANG L , YANG M Q , YANG J Y . Prediction of DNA-binding residues from protein sequence information using random forests[J]. BMC Genomics, 2009, 9 (Suppl. 12)
23 SHARON E , LUBLINER S , SEGAL E . A feature-based approach to modeling protein-DNA interactions[J]. PLoS Computational Biology, 2008, 4 (8): e1000154.
doi: 10.1371/journal.pcbi.1000154
24 VELJKOVIC V , VELJKOVIC N , ESTE J A , et al. Application of the EⅡP/ISM bioinformatics concept in development of new drugs[J]. Current Medicinal Chemistry, 2007, 14 (4): 441- 453.
doi: 10.2174/092986707779941014
25 BONCHEV D . The overall Wiener index: a new tool for characterization of molecular topology[J]. Journal of Chemical Information and Computer Sciences, 2001, 41 (3): 582- 592.
doi: 10.1021/ci000104t
26 BALABAN David H , LEAVELL Jr Byrd S , OBLINGER Michael , et al. Low volume bowel preparation for colonoscopy: randomized endoscopist-blinded trial of liquid sodium phosphate versus tablet sodium phosphate[J]. The American Journal of Gastroenterology, 2003, 98 (10): 2328- 2329.
27 FRISHMAN D , ARGOS P . Seventy-five percent accuracy in protein secondary structure prediction[J]. Proteins, 1997, 27 (3): 329- 335.
28 WANG L , HUANG C , YANG MQ . BindN+ for Accurate Prediction of DNA and RNA-Binding Residues from Protein Sequence Features[J]. BMC Systems Biology, 2010, 4 (Suppl.1)
29 QI Z , TIAN Y , SHI Y . Successive overrelaxation for laplacian support vector machine[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (4): 674- 683.
30 WU J , DIAO Y B , LI M L , et al. A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis[J]. Interdisciplinary Sciences, Computational Life Sciences, 2009, 1 (2): 151- 155.
doi: 10.1007/s12539-009-0016-2
31 TERRIBILINI M , LEE J H , YAN C , et al. Prediction of RNA binding sites in proteins from amino acid sequence[J]. RNA, 2006, 12 (8): 1450- 1462.
doi: 10.1261/rna.2197306
32 KUMAR M , GROMIHA MM , RAGHAVA GP . Prediction of RNA binding sites in a protein using SVM and PSSM profile[J]. Proteins, 2008, 71 (1): 189- 194.
33 WANG L , BROWN S J . BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences[J]. Nucleic Acids Research, 2006, 34 (Web Server issue): W243- 248.
34 BREIMAN L . Random Forests[J]. Machine Learning, 2001, 45, 5- 32.
doi: 10.1023/A:1010933404324
35 VAPNIK V N . Statisical learning theory[M]. Wiley, UK: Wiley-Interscience, 1998.
36 LIAW A W , MATTHEW W . Classification and regression by random forest[J]. R News, 2002, 18- 22.
[1] 吴红岩,冀俊忠. 基于花授粉算法的蛋白质网络功能模块检测方法[J]. 山东大学学报(工学版), 2018, 48(1): 21-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[3] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[4] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[5] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[6] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[7] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[8] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[9] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[10] 孙玉利,李法德,左敦稳,戚美 . 直立分室式流体连续通电加热系统的升温特性[J]. 山东大学学报(工学版), 2006, 36(6): 19 -23 .