g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines

  • Authors:
  • Xiaonan Ji;James Bailey;Kotagiri Ramamohanarao

  • Affiliations:
  • NICTA Victoria Laboratory Department of Computer Science and Software Engineering, University of Melbourne, Australia;NICTA Victoria Laboratory Department of Computer Science and Software Engineering, University of Melbourne, Australia;NICTA Victoria Laboratory Department of Computer Science and Software Engineering, University of Melbourne, Australia

  • Venue:
  • PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classifying protein sequences has important applications in areas such as disease diagnosis, treatment development and drug design. In this paper we present a highly accurate classifier called the g-MARS (gapped Markov Chain with Support Vector Machine) protein classifier. It models the structure of a protein sequence by measuring the transition probabilities between pairs of amino acids. This results in a Markov chain style model for each protein sequence. Then, to capture the similarity among non-exactly matching protein sequences, we show that this model can be generalized to incorporate gaps in the Markov chain. We perform a thorough experimental study and compare g-MARS to several other state-of-the-art protein classifiers. Overall, we demonstrate that g-MARS has superior accuracy and operates efficiently on a diverse range of protein families.