Predicting Gene Ontology functions based on support vector machines and statistical significance estimation

  • Authors:
  • Ran Bi;Yanhong Zhou;Feng Lu;Weiqiang Wang

  • Affiliations:
  • Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China;Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China;Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China;Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China

  • Venue:
  • Neurocomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Gene Ontology (GO) is a common language for the functional annotation of gene products. We have developed a computational tool, GOKey, to predict the GO function of proteins based on their sequence features and the support vector machine (SVM) method. Several measures, including improved handling of the problem caused by unbalanced positive and negative training data and postprocessing strategies to evaluate the posterior probability and statistical significance of SVM outputs, have been adopted to improve the prediction performance of GOKey. The GOKey has been trained to predict the 36 GO categories of the 'molecular function' of GO slims, and could be easily extended to other GO categories. The results of 5-fold cross validation with 10,603 GO-mapped proteins demonstrate that the performance of GOKey is better than that of standard SVMs. Comparisons with other computational tools for GO function prediction also show that the performance of GOKey is satisfactory. Further, GOKey has been applied to predict the GO functions for 5381 novel human proteins in the Ensembl database. The results show that 93% of the novel proteins can be assigned one or more GO terms, and some evidences supporting the predictions have been found. GOKey can be accessed at http://infosci.hust.edu.cn.