Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 76-82.doi: 10.6040/j.issn.1672-3961.0.2019.292

• Machine Learning & Data Mining • Previous Articles     Next Articles

Prediction of microRNA-binding residues based on Laplacian support vector machine and sequence information

Xin MA1(),Xue WANG2   

  1. 1. School of Statistics and Mathematics, Nanjing Audit University, Nanjing 211815, Jiangsu, China
    2. Experimental Center, Nanjing Audit University, Nanjing 211815, Jiangsu, China
  • Received:2019-06-06 Online:2020-04-20 Published:2020-04-16

Abstract:

A new method of semi-surpervised learning algorithm was proposed to predict miRNA-binding residues in protein sequences. The Laplacian support vector machine (LapSVM) algorithm was combined with the newly proposed hybrid features to build a prediction model. The hybrid features were obtained from a combination of secondary structure information, HKM features, and the newly proposed feature combination of amino acid physicochemical properties and evolutionary information. Performance comparison of the various features indicated that our novel feature contributed the most to prediction improvement. The results demonstrated that accuracy of our LapSVM model achieved 88.72%, sensitivity achieved 54.18% and specificity achieved 91.15% using feature selection. The LapSVM model significantly outperformed other approaches at miRNA-binding site prediction.

Key words: microRNA-binding residues, Laplacian support vector machine, evolutionary information, physicochemical properties, feature selection

CLC Number: 

  • Q811.4

Table 1

Main dataset collected from PDB and UniProt"

数据库 蛋白质序列ID号
PDB 2LI8_A,2N82_B,3A6P_A,3A6P_C,3ADI_A,3ADL_A,3TRZ_A,4L8R_C,4NGB_A,4QOZ_B,4W5N_A
UniProt O04379,O04492,P09651,P43243,P48432,P98175,Q01860,Q06787,Q1PRL4,Q2VB19,Q3UHX9,Q4R979,Q5D1E8,Q5RCW2,Q6GLC9,Q80U58,Q8CJF8,Q8K3Y3,Q8R205,Q8R418,Q9JIK5,Q9R0B7,Q9SKN5,Q9U489,Q9UGR2,Q9XGW1,Q9ZVD5

Table 2

Predictive performance of the LapSVM model based on various features"

特征 准确率 敏感性 特异性 MCC
PSSM 0.711 2 0.215 7 0.748 2 0.021
PSSMPP 0.782 0 0.333 3 0.815 5 0.096
PSSMPP+SS 0.836 5 0.451 0 0.865 3 0.221
PSSMPP+HKM 0.831 6 0.414 2 0.861 1 0.188
PSSMPP+SS+HKM 0.875 2 0.493 3 0.903 5 0.291
Optimal 44 features 0.887 2 0.541 8 0.911 5 0.334

Fig.1

MCC values curve of the model constructed from 165 feature subsets"

Table 3

Performance comparison with other machine learning algorithms"

特征 准确率 敏感性 特异性 MCC
RF 0.739 8 0.352 9 0.768 7 0.072
SVM 0.410 1 0.137 3 0.430 5 0.000
LapSVM 0.887 2 0.541 8 0.911 5 0.334
1 AGATA F . MiRNA: new mechanisms of gene expression control[J]. Postepy Biochemii, 2007, 53 (4): 413- 419.
2 MALHAS A , SAUNDERS N J , VAUX D J . The nuclear envelope can control gene expression and cell cycle progression via miRNA regulation[J]. Cell Cycle, 2010, 9 (3): 531- 539.
doi: 10.4161/cc.9.3.10511
3 BARTEL D P . MicroRNAs target recognition and regulatory functions[J]. Cell, 2009, 136 (2): 215- 233.
4 CUSHING L , JIANG Z , KUANG P , et al. The roles of microRNAs and protein components of the microRNA pathway in lung development and diseases[J]. American Journal of Respiratory Cell and Molecular Biology, 2015, 52 (4): 397- 408.
doi: 10.1165/rcmb.2014-0232RT
5 DAI R , AHMED S A . MicroRNA, a new paradigm for understanding immunoregulation, inflammation, and autoimmune diseases[J]. Translational Research: the Journal of Laboratory and Clinical Medicine, 2011, 157 (4): 163- 179.
doi: 10.1016/j.trsl.2011.01.007
6 LEI W , LI G , ZHENG J , SHUI X , et al. Roles of microRNA in vascular diseases in cardiac and pulmonary systems[J]. Die Pharmazie, 2014, 69 (9): 643- 647.
7 LU T X , ROTHENBERG M E . Diagnostic, functional, and therapeutic roles of microRNA in allergic diseases[J]. The Journal of Allergy and Clinical Immunology, 2013, 132 (1): 3- 13.
doi: 10.1016/j.jaci.2013.04.039
8 WAHID F , KHAN T , KIM Y Y . MicroRNA and diseases: therapeutic potential as new generation of drugs[J]. Biochimie, 2014, 104, 12- 26.
doi: 10.1016/j.biochi.2014.05.004
9 WU J S , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinfor-matic, 2013, 10 (3): 752- 759.
doi: 10.1109/TCBB.2013.75
10 BELKIN M , NIYOGI P , SINDHWANI V . Manifold regularization: a geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research, 2006, 7, 2399- 2434.
11 BERMAN H M , WESTBROOK J , FENG Z , et al. The protein data bank[J]. Nucleic Acids Research, 2000, 28 (1): 235- 242.
12 UNIPROT C . UniProt: a hub for protein information[J]. Nucleic Acids Research, 2015, 43, D204- 212.
doi: 10.1093/nar/gku989
13 ALTSCHUL S F , MADDEN T L , SCHAFFER A A , et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25 (17): 3389- 3402.
doi: 10.1093/nar/25.17.3389
14 CHEN Y C , LIM C . Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry[J]. Nucleic Acids Research, 2008, 36 (5): e29.
doi: 10.1093/nar/gkn008
15 CHENG C W , SU E C , HWANG J K , et al. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information[J]. BMC Bioinformatics, 2008, 12 (Suppl.9): 6.
16 TONG J , JIANG P , LU Z H . RISP: a web-based server for prediction of RNA-binding sites in proteins[J]. Computer Methods and Programs in Biomedicine, 2008, 90 (2): 148- 153.
doi: 10.1016/j.cmpb.2007.12.003
17 MA X , GUO J , WU J S , et al. Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature[J]. Proteins, 2011, 79 (4): 1230- 1239.
18 WU J Z , ZHOU Z H . Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2013, 10 (3): 752- 759.
19 MA X , GUO J , LIU HD , et al. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatic, 2012, 9 (6): 1766- 1775.
doi: 10.1109/TCBB.2012.106
20 HUANG Y F , HUANG C C , LIU Y C , et al. DNA-binding residues and binding mode prediction with binding-mechanism concerned models[J]. BMC Genomics, 2009, 10 (Suppl.1): 3- 23.
21 ROHS R , WEST S M , SOSINSKY A , et al. The role of DNA shape in protein-DNA recognition[J]. Nature, 2009, 461 (7268): 1248- 1253.
doi: 10.1038/nature08473
22 WANG L , YANG M Q , YANG J Y . Prediction of DNA-binding residues from protein sequence information using random forests[J]. BMC Genomics, 2009, 9 (Suppl. 12)
23 SHARON E , LUBLINER S , SEGAL E . A feature-based approach to modeling protein-DNA interactions[J]. PLoS Computational Biology, 2008, 4 (8): e1000154.
doi: 10.1371/journal.pcbi.1000154
24 VELJKOVIC V , VELJKOVIC N , ESTE J A , et al. Application of the EⅡP/ISM bioinformatics concept in development of new drugs[J]. Current Medicinal Chemistry, 2007, 14 (4): 441- 453.
doi: 10.2174/092986707779941014
25 BONCHEV D . The overall Wiener index: a new tool for characterization of molecular topology[J]. Journal of Chemical Information and Computer Sciences, 2001, 41 (3): 582- 592.
doi: 10.1021/ci000104t
26 BALABAN David H , LEAVELL Jr Byrd S , OBLINGER Michael , et al. Low volume bowel preparation for colonoscopy: randomized endoscopist-blinded trial of liquid sodium phosphate versus tablet sodium phosphate[J]. The American Journal of Gastroenterology, 2003, 98 (10): 2328- 2329.
27 FRISHMAN D , ARGOS P . Seventy-five percent accuracy in protein secondary structure prediction[J]. Proteins, 1997, 27 (3): 329- 335.
28 WANG L , HUANG C , YANG MQ . BindN+ for Accurate Prediction of DNA and RNA-Binding Residues from Protein Sequence Features[J]. BMC Systems Biology, 2010, 4 (Suppl.1)
29 QI Z , TIAN Y , SHI Y . Successive overrelaxation for laplacian support vector machine[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (4): 674- 683.
30 WU J , DIAO Y B , LI M L , et al. A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis[J]. Interdisciplinary Sciences, Computational Life Sciences, 2009, 1 (2): 151- 155.
doi: 10.1007/s12539-009-0016-2
31 TERRIBILINI M , LEE J H , YAN C , et al. Prediction of RNA binding sites in proteins from amino acid sequence[J]. RNA, 2006, 12 (8): 1450- 1462.
doi: 10.1261/rna.2197306
32 KUMAR M , GROMIHA MM , RAGHAVA GP . Prediction of RNA binding sites in a protein using SVM and PSSM profile[J]. Proteins, 2008, 71 (1): 189- 194.
33 WANG L , BROWN S J . BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences[J]. Nucleic Acids Research, 2006, 34 (Web Server issue): W243- 248.
34 BREIMAN L . Random Forests[J]. Machine Learning, 2001, 45, 5- 32.
doi: 10.1023/A:1010933404324
35 VAPNIK V N . Statisical learning theory[M]. Wiley, UK: Wiley-Interscience, 1998.
36 LIAW A W , MATTHEW W . Classification and regression by random forest[J]. R News, 2002, 18- 22.
[1] Jiachen WANG,Xianghong TANG,Jianguang LU. Research onfeature selection technology in bearing fault diagnosis [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 80-87, 95.
[2] Hong CHEN,Xiaofei YANG,Qing WAN,Yingcang MA. Multi-label feature selection algorithm based on correntropy andmanifold learning [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 27-36.
[3] Lianming MOU. Weighted k sub-convex-hull classifier based on adaptive feature selection [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 32-37.
[4] YUAN Shasha, YU Haibo, GAO Mingming, WANG Xinhua. Performance of mature aerobic granules under the absence of organic carbon source [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 124-130.
[5] LI Sushu, WANG Shitong, LI Tao. A feature selection method based on LS-SVM and fuzzy supplementary criterion [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 34-42.
[6] FANG Hao, LI Yun. Random undersampling and POSS method for software defect prediction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 15-21.
[7] MO Xiaoyong, PAN Zhisong, QIU Junyang, YU Yajun, JIANG Mingchu. Anomaly detection in network traffic based on online feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(4): 21-27.
[8] WEI Xiaomin, XU Bin, GUAN Jihong. Prediction of protein energy hot spots based on recursion feature elimination [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(2): 12-20.
[9] PAN Dong-yin, ZHU Fa, XU Sheng, YE Ning*. Feature selection of gene expression profiles of colon cancer [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(2): 23-29.
[10] LI Guo-he1,2, YUE Xiang1,2, LI Xue3, WU Wei-jiang1,2, LI Hong-qi1. A method of feature selection for continuous attributes [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(6): 1-6.
[11] LI Xia1, WANG Lian-xi2, JIANG Sheng-yi1. Ensemble learning based feature selection for imbalanced problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 7-11.
[12] YOU Ming-yu, CHEN Yan, LI Guo-zheng. Im-IG: A novel feature selection method for imbalanced problems [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 123-128.
[13] YANG Ai-min1, ZHOU Yong-mei1, DENG He2, ZHOU Jian-feng3. Method of feature generation and selection for network traffic classification [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 1-7.
[14] TAN Tai-zhe, LIANG Ying-yi, LIU Fu-chun. Application of ReliefF feature evaluation in un-supervised manifold learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 66-71.
[15] DAI Ping, LI Ning*. A fast SVM-based feature selection method [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 60-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[2] LIANG Jing-yun,WANG Ming-gang,CHAI Jia-qian,LIU yong-qing . Synthesis and in vitro antibacterial activity of 1,6-Di-(N5-phenyl-N1-diguanido) hexane dihydrochloride[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 104 -107 .
[3] LIU Bin, LI Shu-cai, ZHANG Qing-song, LI Shu-chen, XUE Yi-guo. Study of the prediction of karstfractured groundwater in prediction and early warning system of tunnel geologic hazards[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 115 -121 .
[4] LI Meng-li, WANG Wei-qiang ,XU Shu-gen , SONG Ming-da. Possibility analysis on chemical explosion of material causing urea  reactor cylinder fracture[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 1 -6 .
[5] WANG Hai-Tao, ZHAO Dong-Biao, GAO Su-Mei. Research on the S-shape acceleration/deceleration algorithm in NURBS curve real time interpolation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(1): 63 -67 .
[6] WU En-qi1, DU Bao-jiang1, WANG Hai-peng1, YU Jian-ping2. Study of visual planning of  underground power pipelines based on virtual reality[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(6): 54 -57 .
[7] WEI Shou-shui,JIANG Xing-e,BAI Guang-lei,JIANG Chun-xiang . Modal and harmonic response analyses of microfluid driving straightpipe model[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 67 -70 .
[8] LIU Qiong, WU Xiao-Jun. An improved immune clonal selection algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(6): 8 -12 .
[9] LU Dan, ZHOU Yiqi. Vibration analysis of excavator seat based on EEMD and CWT[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(3): 58 -64 .
[10] WU Junliang,AN Ping,SUN Wenhao . A new technology of machining cylindrical helical gear[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 48 -51 .