Protein homology detection with biologically inspired features and interpretable statistical models

  • Authors:
  • Pai-Hsi Huang;Vladimir Pavlovic

  • Affiliations:
  • Department of Computer Science, Rutgers University, Piscataway, NJ 08854-8019, USA.;Department of Computer Science, Rutgers University, Piscataway, NJ 08854-8019, USA

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computational classification of proteins using methods such as string kernels and Fisher-SVM has demonstrated great success. However, the resulting models do not offer an immediate interpretation of the underlying biological mechanisms. In this work, we propose a biologically motivated feature set combined with a sparse classifier, based on a small subset of positions and residues in protein sequences, for protein superfamily detection and show the performance of our models is comparable to that of the state-of-the-art methods on a benchmark dataset. The set of sparse critical features discovered by the models is consistent with the confirmed biological findings.