Discriminative remote homology detection using maximal unique sequence matches

  • Authors:
  • Hasan Oğul;Ü. Erkan Mumcuoğlu

  • Affiliations:
  • Department of Computer Engineering, Baskent University, Ankara, Turkey;Information Systems and Health Informatics, Informatics Institute, Middle East Technical University, Ankara, Turkey

  • Venue:
  • IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We define a new pairwise sequence comparison scheme for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all non-overlapping MUMs is used to define the similarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.