Comparison of probabilistic combination methods for protein secondary structure prediction

  • Authors:
  • Yan Liu;Jaime Carbonell;Judith Klein-Seetharaman;Vanathi Gopalakrishnan

  • Affiliations:
  • Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA15213, USA;Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA15213, USA;Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA15213, USA;Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA15260, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Protein secondary structure prediction is an important step towards understanding how proteins fold in three dimensions. Recent analysis by information theory indicates that the correlation between neighboring secondary structures are much stronger than that of neighboring amino acids. In this article, we focus on the combination problem for sequences, i.e. combining the scores or assignments from single or multiple prediction systems under the constraint of a whole sequence, as a target for improvement in protein secondary structure prediction. Results: We apply several graphical chain models to solve the combination problem and show that they are consistently more effective than the traditional window-based methods. In particular, conditional random fields (CRFs) moderately improve the predictions for helices and, more importantly, for beta sheets, which are the major bottleneck for protein secondary structure prediction.