山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 76-82.doi: 10.6040/j.issn.1672-3961.0.2019.292
摘要:
提出一种半监督学习算法预测蛋白质序列中microRNA-结合残基的新式的方法。通过Laplacian支持向量机(Laplacian support vector machine,LapSVM)算法结合新提出的混合特征构建预测模型。混合特征是由三类信息组合获得:二级结构信息、HKM特征和新提出的氨基酸理化特性和进化信息结合的特征。比较各种特征的预测性能,新提出的这一特征对预测性能的提高贡献最大。结果表明,通过特征选择,本研究构建的预测模型准确性达到88.72%,敏感性达到54.18%,特异性达到91.15%,明显优于其他方法。
中图分类号:
1 | AGATA F . MiRNA: new mechanisms of gene expression control[J]. Postepy Biochemii, 2007, 53 (4): 413- 419. |
2 |
MALHAS A , SAUNDERS N J , VAUX D J . The nuclear envelope can control gene expression and cell cycle progression via miRNA regulation[J]. Cell Cycle, 2010, 9 (3): 531- 539.
doi: 10.4161/cc.9.3.10511 |
3 | BARTEL D P . MicroRNAs target recognition and regulatory functions[J]. Cell, 2009, 136 (2): 215- 233. |
4 |
CUSHING L , JIANG Z , KUANG P , et al. The roles of microRNAs and protein components of the microRNA pathway in lung development and diseases[J]. American Journal of Respiratory Cell and Molecular Biology, 2015, 52 (4): 397- 408.
doi: 10.1165/rcmb.2014-0232RT |
5 |
DAI R , AHMED S A . MicroRNA, a new paradigm for understanding immunoregulation, inflammation, and autoimmune diseases[J]. Translational Research: the Journal of Laboratory and Clinical Medicine, 2011, 157 (4): 163- 179.
doi: 10.1016/j.trsl.2011.01.007 |
6 | LEI W , LI G , ZHENG J , SHUI X , et al. Roles of microRNA in vascular diseases in cardiac and pulmonary systems[J]. Die Pharmazie, 2014, 69 (9): 643- 647. |
7 |
LU T X , ROTHENBERG M E . Diagnostic, functional, and therapeutic roles of microRNA in allergic diseases[J]. The Journal of Allergy and Clinical Immunology, 2013, 132 (1): 3- 13.
doi: 10.1016/j.jaci.2013.04.039 |
8 |
WAHID F , KHAN T , KIM Y Y . MicroRNA and diseases: therapeutic potential as new generation of drugs[J]. Biochimie, 2014, 104, 12- 26.
doi: 10.1016/j.biochi.2014.05.004 |
9 |
WU J S , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinfor-matic, 2013, 10 (3): 752- 759.
doi: 10.1109/TCBB.2013.75 |
10 | BELKIN M , NIYOGI P , SINDHWANI V . Manifold regularization: a geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research, 2006, 7, 2399- 2434. |
11 | BERMAN H M , WESTBROOK J , FENG Z , et al. The protein data bank[J]. Nucleic Acids Research, 2000, 28 (1): 235- 242. |
12 |
UNIPROT C . UniProt: a hub for protein information[J]. Nucleic Acids Research, 2015, 43, D204- 212.
doi: 10.1093/nar/gku989 |
13 |
ALTSCHUL S F , MADDEN T L , SCHAFFER A A , et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25 (17): 3389- 3402.
doi: 10.1093/nar/25.17.3389 |
14 |
CHEN Y C , LIM C . Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry[J]. Nucleic Acids Research, 2008, 36 (5): e29.
doi: 10.1093/nar/gkn008 |
15 | CHENG C W , SU E C , HWANG J K , et al. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information[J]. BMC Bioinformatics, 2008, 12 (Suppl.9): 6. |
16 |
TONG J , JIANG P , LU Z H . RISP: a web-based server for prediction of RNA-binding sites in proteins[J]. Computer Methods and Programs in Biomedicine, 2008, 90 (2): 148- 153.
doi: 10.1016/j.cmpb.2007.12.003 |
17 | MA X , GUO J , WU J S , et al. Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature[J]. Proteins, 2011, 79 (4): 1230- 1239. |
18 | WU J Z , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2013, 10 (3): 752- 759. |
19 |
MA X , GUO J , LIU HD , et al. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2012, 9 (6): 1766- 1775.
doi: 10.1109/TCBB.2012.106 |
20 | HUANG Y F , HUANG C C , LIU Y C , et al. DNA-binding residues and binding mode prediction with binding-mechanism concerned models[J]. BMC Genomics, 2009, 10 (Suppl.1): 3- 23. |
21 |
ROHS R , WEST S M , SOSINSKY A , et al. The role of DNA shape in protein-DNA recognition[J]. Nature, 2009, 461 (7268): 1248- 1253.
doi: 10.1038/nature08473 |
22 | WANG L , YANG M Q , YANG J Y . Prediction of DNA-binding residues from protein sequence information using random forests[J]. BMC Genomics, 2009, 9 (Suppl. 12) |
23 |
SHARON E , LUBLINER S , SEGAL E . A feature-based approach to modeling protein-DNA interactions[J]. PLoS Computational Biology, 2008, 4 (8): e1000154.
doi: 10.1371/journal.pcbi.1000154 |
24 |
VELJKOVIC V , VELJKOVIC N , ESTE J A , et al. Application of the EⅡP/ISM bioinformatics concept in development of new drugs[J]. Current Medicinal Chemistry, 2007, 14 (4): 441- 453.
doi: 10.2174/092986707779941014 |
25 |
BONCHEV D . The overall Wiener index: a new tool for characterization of molecular topology[J]. Journal of Chemical Information and Computer Sciences, 2001, 41 (3): 582- 592.
doi: 10.1021/ci000104t |
26 | BALABAN David H , LEAVELL Jr Byrd S , OBLINGER Michael , et al. Low volume bowel preparation for colonoscopy: randomized endoscopist-blinded trial of liquid sodium phosphate versus tablet sodium phosphate[J]. The American Journal of Gastroenterology, 2003, 98 (10): 2328- 2329. |
27 | FRISHMAN D , ARGOS P . Seventy-five percent accuracy in protein secondary structure prediction[J]. Proteins, 1997, 27 (3): 329- 335. |
28 | WANG L , HUANG C , YANG MQ . BindN+ for Accurate Prediction of DNA and RNA-Binding Residues from Protein Sequence Features[J]. BMC Systems Biology, 2010, 4 (Suppl.1) |
29 | QI Z , TIAN Y , SHI Y . Successive overrelaxation for laplacian support vector machine[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (4): 674- 683. |
30 |
WU J , DIAO Y B , LI M L , et al. A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis[J]. Interdisciplinary Sciences, Computational Life Sciences, 2009, 1 (2): 151- 155.
doi: 10.1007/s12539-009-0016-2 |
31 |
TERRIBILINI M , LEE J H , YAN C , et al. Prediction of RNA binding sites in proteins from amino acid sequence[J]. RNA, 2006, 12 (8): 1450- 1462.
doi: 10.1261/rna.2197306 |
32 | KUMAR M , GROMIHA MM , RAGHAVA GP . Prediction of RNA binding sites in a protein using SVM and PSSM profile[J]. Proteins, 2008, 71 (1): 189- 194. |
33 | WANG L , BROWN S J . BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences[J]. Nucleic Acids Research, 2006, 34 (Web Server issue): W243- 248. |
34 |
BREIMAN L . Random Forests[J]. Machine Learning, 2001, 45, 5- 32.
doi: 10.1023/A:1010933404324 |
35 | VAPNIK V N . Statisical learning theory[M]. Wiley, UK: Wiley-Interscience, 1998. |
36 | LIAW A W , MATTHEW W . Classification and regression by random forest[J]. R News, 2002, 18- 22. |
[1] | 吴红岩,冀俊忠. 基于花授粉算法的蛋白质网络功能模块检测方法[J]. 山东大学学报(工学版), 2018, 48(1): 21-30. |
|