您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (1): 1-6.doi: 10.6040/j.issn.1672-3961.1.2013.029

• 机器学习与数据挖掘 •    下一篇

基于TIGA-S4VM改进算法的蛋白质序列识别方法

王晓峰,随婷婷*   

  1. 上海海事大学信息工程学院, 上海  201306
  • 收稿日期:2013-05-14 出版日期:2014-02-20 发布日期:2013-05-14
  • 通讯作者: 随婷婷(1988-),女,河南周口人,博士研究生,主要研究方向为数据挖掘,模式识别,生物信息学.E-mail:suisui61@163.com E-mail:suisui61@163.com
  • 作者简介:王晓峰(1958- ),男,辽宁灯塔人,工学博士,教授,主要研究方向为人工智能及其在交通信息与控制工程中的应用,数据挖掘与知识发现,生物信息学.Email:xfwang@shmtu.edu.cn
  • 基金资助:

    国家自然科学基金资助项目(61003093)

Protein sequence identification based on improved TIGA-S4VM algorithm

WANG Xiao-feng, SUI Ting-ting*   

  1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Received:2013-05-14 Online:2014-02-20 Published:2013-05-14

摘要:

针对安全的半监督支持向量机(safe semi-supervised support vector machine,S4VM)存在参数选择盲目性、正负样本比例不平衡等问题,建立了基于改进的TF-IDF(term frequency-inverse document frequency, TF-IDF)、遗传算法(genetic algorithm, GA)和S4VM的蛋白质序列识别方法TIGA-S4VM。利用改进的TF-IDF算法提取出蛋白质序列中的特征项,将各个特征项在蛋白质序列中出现的频率归一化后作为识别模型的特征值,并结合GA以及S4VM对蛋白质序列进行识别。实验结果表明,TIGA-S4VM优于其它5个识别方法,即使在训练样本率较低时,也能有效地识别蛋白质序列。

Abstract:

In order to effectively deal with the choice blindness of parameters and unbalanced class sizes, TIGA-S4VM, a protein sequence identification model was developed and trained using safe semi-supervised support vector machine (S4VM) based on improved TF-IDF algorithm and Genetic Algorithm (GA). LBTF-IDF, the improved TF-IDF algorithm, was put forward in this model for extracting the protein sequences′  features. After the normalization of features′  frequencies, the results were taken as the characteristic values for classifier. Combining LBTF-IDF, GA and S4VM, the mixed strategy was used to identify the protein sequences. Experiment results showed that the method was superior to other five classification methods and could get good classification performance with reduced training set.

Key words: GA, protein sequence identification, semi-supervised algorithm, S4VM, TF-IDF, support vector machine

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!