Learning Models for Aligning Protein Sequences with Predicted Secondary Structure

Authors:
Eagu Kim;Travis Wheeler;John Kececioglu
Affiliations:
Department of Computer Science, The University of Arizona, Tucson, USA AZ 85721;Department of Computer Science, The University of Arizona, Tucson, USA AZ 85721;Department of Computer Science, The University of Arizona, Tucson, USA AZ 85721
Venue:
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Year:
2009

Citing 13
Cited 0

Speeding up dynamic programming with application to molecular biology

Theoretical Computer Science
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Combinatorial optimization

Combinatorial optimization
Aligning alignments exactly

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Align-m---a new algorithm for multiple alignment of highly divergent sequences

Bioinformatics
Protein homology detection by HMM--HMM comparison

Bioinformatics
SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures

Bioinformatics
PROMALS

Bioinformatics
Multiple alignment by aligning alignments

Bioinformatics
Learning Scoring Schemes for Sequence Alignment from Partial Examples

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CONTRAlign: discriminative training for protein sequence alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Simple and fast inverse alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Inverse sequence alignment from partial examples

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurately aligning distant protein sequences is notoriously difficult. A recent approach to improving alignment accuracy is to use additional information such as predicted secondary structure . We introduce several new models for scoring alignments of protein sequences with predicted secondary structure, which use the predictions and their confidences to modify both the substitution and gap cost functions. We present efficient algorithms for computing optimal pairwise alignments under these models, all of which run in near-quadratic time. We also review an approach to learning the values of the parameters in these models called inverse alignment . We then evaluate the accuracy of these models by studying how well an optimal alignment under the model recovers known benchmark reference alignments. Our experiments show that using parameters learned by inverse alignment, these new secondary-structure-based models provide a significant improvement in alignment accuracy for distant sequences. The best model improves upon the accuracy of the standard sequence alignment model for pairwise alignment by as much as 15% for sequences with less than 25% identity, and improves the accuracy of multiple alignment by 20% for difficult benchmarks whose average accuracy under standard tools is less than 40%.