Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 76-82.doi: 10.6040/j.issn.1672-3961.0.2019.292

• Machine Learning & Data Mining • Previous Articles     Next Articles

Prediction of microRNA-binding residues based on Laplacian support vector machine and sequence information

Xin MA1(),Xue WANG2   

  1. 1. School of Statistics and Mathematics, Nanjing Audit University, Nanjing 211815, Jiangsu, China
    2. Experimental Center, Nanjing Audit University, Nanjing 211815, Jiangsu, China
  • Received:2019-06-06 Online:2020-04-20 Published:2020-04-16

Abstract:

A new method of semi-surpervised learning algorithm was proposed to predict miRNA-binding residues in protein sequences. The Laplacian support vector machine (LapSVM) algorithm was combined with the newly proposed hybrid features to build a prediction model. The hybrid features were obtained from a combination of secondary structure information, HKM features, and the newly proposed feature combination of amino acid physicochemical properties and evolutionary information. Performance comparison of the various features indicated that our novel feature contributed the most to prediction improvement. The results demonstrated that accuracy of our LapSVM model achieved 88.72%, sensitivity achieved 54.18% and specificity achieved 91.15% using feature selection. The LapSVM model significantly outperformed other approaches at miRNA-binding site prediction.

Key words: microRNA-binding residues, Laplacian support vector machine, evolutionary information, physicochemical properties, feature selection

CLC Number: 

  • Q811.4

Table 1

Main dataset collected from PDB and UniProt"

数据库 蛋白质序列ID号
PDB 2LI8_A,2N82_B,3A6P_A,3A6P_C,3ADI_A,3ADL_A,3TRZ_A,4L8R_C,4NGB_A,4QOZ_B,4W5N_A
UniProt O04379,O04492,P09651,P43243,P48432,P98175,Q01860,Q06787,Q1PRL4,Q2VB19,Q3UHX9,Q4R979,Q5D1E8,Q5RCW2,Q6GLC9,Q80U58,Q8CJF8,Q8K3Y3,Q8R205,Q8R418,Q9JIK5,Q9R0B7,Q9SKN5,Q9U489,Q9UGR2,Q9XGW1,Q9ZVD5

Table 2

Predictive performance of the LapSVM model based on various features"

特征 准确率 敏感性 特异性 MCC
PSSM 0.711 2 0.215 7 0.748 2 0.021
PSSMPP 0.782 0 0.333 3 0.815 5 0.096
PSSMPP+SS 0.836 5 0.451 0 0.865 3 0.221
PSSMPP+HKM 0.831 6 0.414 2 0.861 1 0.188
PSSMPP+SS+HKM 0.875 2 0.493 3 0.903 5 0.291
Optimal 44 features 0.887 2 0.541 8 0.911 5 0.334

Fig.1

MCC values curve of the model constructed from 165 feature subsets"

Table 3

Performance comparison with other machine learning algorithms"

特征 准确率 敏感性 特异性 MCC
RF 0.739 8 0.352 9 0.768 7 0.072
SVM 0.410 1 0.137 3 0.430 5 0.000
LapSVM 0.887 2 0.541 8 0.911 5 0.334
1 AGATA F . MiRNA: new mechanisms of gene expression control[J]. Postepy Biochemii, 2007, 53 (4): 413- 419.
2 MALHAS A , SAUNDERS N J , VAUX D J . The nuclear envelope can control gene expression and cell cycle progression via miRNA regulation[J]. Cell Cycle, 2010, 9 (3): 531- 539.
doi: 10.4161/cc.9.3.10511
3 BARTEL D P . MicroRNAs target recognition and regulatory functions[J]. Cell, 2009, 136 (2): 215- 233.
4 CUSHING L , JIANG Z , KUANG P , et al. The roles of microRNAs and protein components of the microRNA pathway in lung development and diseases[J]. American Journal of Respiratory Cell and Molecular Biology, 2015, 52 (4): 397- 408.
doi: 10.1165/rcmb.2014-0232RT
5 DAI R , AHMED S A . MicroRNA, a new paradigm for understanding immunoregulation, inflammation, and autoimmune diseases[J]. Translational Research: the Journal of Laboratory and Clinical Medicine, 2011, 157 (4): 163- 179.
doi: 10.1016/j.trsl.2011.01.007
6 LEI W , LI G , ZHENG J , SHUI X , et al. Roles of microRNA in vascular diseases in cardiac and pulmonary systems[J]. Die Pharmazie, 2014, 69 (9): 643- 647.
7 LU T X , ROTHENBERG M E . Diagnostic, functional, and therapeutic roles of microRNA in allergic diseases[J]. The Journal of Allergy and Clinical Immunology, 2013, 132 (1): 3- 13.
doi: 10.1016/j.jaci.2013.04.039
8 WAHID F , KHAN T , KIM Y Y . MicroRNA and diseases: therapeutic potential as new generation of drugs[J]. Biochimie, 2014, 104, 12- 26.
doi: 10.1016/j.biochi.2014.05.004
9 WU J S , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinfor-matic, 2013, 10 (3): 752- 759.
doi: 10.1109/TCBB.2013.75
10 BELKIN M , NIYOGI P , SINDHWANI V . Manifold regularization: a geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research, 2006, 7, 2399- 2434.
11 BERMAN H M , WESTBROOK J , FENG Z , et al. The protein data bank[J]. Nucleic Acids Research, 2000, 28 (1): 235- 242.
12 UNIPROT C . UniProt: a hub for protein information[J]. Nucleic Acids Research, 2015, 43, D204- 212.
doi: 10.1093/nar/gku989
13 ALTSCHUL S F , MADDEN T L , SCHAFFER A A , et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25 (17): 3389- 3402.
doi: 10.1093/nar/25.17.3389
14 CHEN Y C , LIM C . Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry[J]. Nucleic Acids Research, 2008, 36 (5): e29.
doi: 10.1093/nar/gkn008
15 CHENG C W , SU E C , HWANG J K , et al. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information[J]. BMC Bioinformatics, 2008, 12 (Suppl.9): 6.
16 TONG J , JIANG P , LU Z H . RISP: a web-based server for prediction of RNA-binding sites in proteins[J]. Computer Methods and Programs in Biomedicine, 2008, 90 (2): 148- 153.
doi: 10.1016/j.cmpb.2007.12.003
17 MA X , GUO J , WU J S , et al. Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature[J]. Proteins, 2011, 79 (4): 1230- 1239.
18 WU J Z , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2013, 10 (3): 752- 759.
19 MA X , GUO J , LIU HD , et al. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2012, 9 (6): 1766- 1775.
doi: 10.1109/TCBB.2012.106
20 HUANG Y F , HUANG C C , LIU Y C , et al. DNA-binding residues and binding mode prediction with binding-mechanism concerned models[J]. BMC Genomics, 2009, 10 (Suppl.1): 3- 23.
21 ROHS R , WEST S M , SOSINSKY A , et al. The role of DNA shape in protein-DNA recognition[J]. Nature, 2009, 461 (7268): 1248- 1253.
doi: 10.1038/nature08473
22 WANG L , YANG M Q , YANG J Y . Prediction of DNA-binding residues from protein sequence information using random forests[J]. BMC Genomics, 2009, 9 (Suppl. 12)
23 SHARON E , LUBLINER S , SEGAL E . A feature-based approach to modeling protein-DNA interactions[J]. PLoS Computational Biology, 2008, 4 (8): e1000154.
doi: 10.1371/journal.pcbi.1000154
24 VELJKOVIC V , VELJKOVIC N , ESTE J A , et al. Application of the EⅡP/ISM bioinformatics concept in development of new drugs[J]. Current Medicinal Chemistry, 2007, 14 (4): 441- 453.
doi: 10.2174/092986707779941014
25 BONCHEV D . The overall Wiener index: a new tool for characterization of molecular topology[J]. Journal of Chemical Information and Computer Sciences, 2001, 41 (3): 582- 592.
doi: 10.1021/ci000104t
26 BALABAN David H , LEAVELL Jr Byrd S , OBLINGER Michael , et al. Low volume bowel preparation for colonoscopy: randomized endoscopist-blinded trial of liquid sodium phosphate versus tablet sodium phosphate[J]. The American Journal of Gastroenterology, 2003, 98 (10): 2328- 2329.
27 FRISHMAN D , ARGOS P . Seventy-five percent accuracy in protein secondary structure prediction[J]. Proteins, 1997, 27 (3): 329- 335.
28 WANG L , HUANG C , YANG MQ . BindN+ for Accurate Prediction of DNA and RNA-Binding Residues from Protein Sequence Features[J]. BMC Systems Biology, 2010, 4 (Suppl.1)
29 QI Z , TIAN Y , SHI Y . Successive overrelaxation for laplacian support vector machine[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (4): 674- 683.
30 WU J , DIAO Y B , LI M L , et al. A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis[J]. Interdisciplinary Sciences, Computational Life Sciences, 2009, 1 (2): 151- 155.
doi: 10.1007/s12539-009-0016-2
31 TERRIBILINI M , LEE J H , YAN C , et al. Prediction of RNA binding sites in proteins from amino acid sequence[J]. RNA, 2006, 12 (8): 1450- 1462.
doi: 10.1261/rna.2197306
32 KUMAR M , GROMIHA MM , RAGHAVA GP . Prediction of RNA binding sites in a protein using SVM and PSSM profile[J]. Proteins, 2008, 71 (1): 189- 194.
33 WANG L , BROWN S J . BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences[J]. Nucleic Acids Research, 2006, 34 (Web Server issue): W243- 248.
34 BREIMAN L . Random Forests[J]. Machine Learning, 2001, 45, 5- 32.
doi: 10.1023/A:1010933404324
35 VAPNIK V N . Statisical learning theory[M]. Wiley, UK: Wiley-Interscience, 1998.
36 LIAW A W , MATTHEW W . Classification and regression by random forest[J]. R News, 2002, 18- 22.
[1] TANG Jiefeng, ZHANG Jia, LONG Jinyi. Fast multi-label feature selection method based on global redundancy minimization [J]. Journal of Shandong University(Engineering Science), 2025, 55(6): 21-34.
[2] Caihui LIU,Qi ZHOU,Xiaowen YE. An intrusion detection model based on improved ReliefF algorithm [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 1-10.
[3] Yan PENG,Tingting FENG,Jie WANG. An integrated learning approach for O3 mass concentration prediction model [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 1-7.
[4] Jiachen WANG, Xianghong TANG, Jianguang LU. Research onfeature selection technology in bearing fault diagnosis [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 80-87.
[5] Hong CHEN,Xiaofei YANG,Qing WAN,Yingcang MA. Multi-label feature selection algorithm based on correntropy andmanifold learning [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 27-36.
[6] Lianming MOU. Weighted k sub-convex-hull classifier based on adaptive feature selection [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 32-37.
[7] YUAN Shasha, YU Haibo, GAO Mingming, WANG Xinhua. Performance of mature aerobic granules under the absence of organic carbon source [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 124-130.
[8] LI Sushu, WANG Shitong, LI Tao. A feature selection method based on LS-SVM and fuzzy supplementary criterion [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 34-42.
[9] FANG Hao, LI Yun. Random undersampling and POSS method for software defect prediction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 15-21.
[10] MO Xiaoyong, PAN Zhisong, QIU Junyang, YU Yajun, JIANG Mingchu. Anomaly detection in network traffic based on online feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 21-27.
[11] WEI Xiaomin, XU Bin, GUAN Jihong. Prediction of protein energy hot spots based on recursion feature elimination [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 12-20.
[12] PAN Dong-yin, ZHU Fa, XU Sheng, YE Ning*. Feature selection of gene expression profiles of colon cancer [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(2): 23-29.
[13] LI Guo-he1,2, YUE Xiang1,2, LI Xue3, WU Wei-jiang1,2, LI Hong-qi1. A method of feature selection for continuous attributes [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(6): 1-6.
[14] LI Xia1, WANG Lian-xi2, JIANG Sheng-yi1. Ensemble learning based feature selection for imbalanced problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 7-11.
[15] YANG Ai-min1, ZHOU Yong-mei1, DENG He2, ZHOU Jian-feng3. Method of feature generation and selection for network traffic classification [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[2] KONG Xiang-zhen,LIU Yan-jun,WANG Yong,ZHAO Xiu-hua . Compensation and simulation for the deadband of the pneumatic proportional valve[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 99 -102 .
[3] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[4] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[5] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[6] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[7] ZHANG Ying,LANG Yongmei,ZHAO Yuxiao,ZHANG Jianda,QIAO Peng,LI Shanping . Research on technique of aerobic granular sludge cultivationby seeding EGSB anaerobic granular sludge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 56 -59 .
[8] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[9] SUN Weiwei, WANG Yuzhen. Finite gain stabilization of singlemachine infinite bus system subject to saturation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 69 -76 .
[10] SUN Yu-li,LI De-fa,ZUO Dun-wen,QI mei . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 19 -23 .