Brief communication: Integrating subcellular location for improving machine learning models of remote homology detection in eukaryotic organisms

  • Authors:
  • Anuj R. Shah;Christopher S. Oehmen;Jill Harper;Bobbie-Jo M. Webb-Robertson

  • Affiliations:
  • Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA;-;Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A significant challenge in homology detection is to identify sequences that share a common evolutionary ancestor, despite significant primary sequence divergence. Remote homologs will often have less than 30% sequence identity, yet still retain common structural and functional properties. We demonstrate a novel method for identifying remote homologs using a support vector machine (SVM) classifier trained by fusing sequence similarity scores and subcellular location prediction. SVMs have been shown to perform well in a variety of applications where binary classification of data is the goal. At the same time, data fusion methods have been shown to be highly effective in enhancing discriminative power of data. Combining these two approaches in the application SVM-SimLoc resulted in identification of significantly more remote homologs (p-value