Simple and fast inverse alignment

Authors:
John Kececioglu;Eagu Kim
Affiliations:
Department of Computer Science, The University of Arizona, Tucson, AZ;Department of Computer Science, The University of Arizona, Tucson, AZ
Venue:
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Year:
2006

Citing 5
Cited 5

Fast and numerically stable parametric alignment of biosequences

RECOMB '97 Proceedings of the first annual international conference on Computational molecular biology
Finding the k Shortest Paths

SIAM Journal on Computing
Bounds for parametric sequence comparison

Discrete Applied Mathematics
Aligning alignments exactly

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Inverse parametric sequence alignment

Journal of Algorithms

Learning Scoring Schemes for Sequence Alignment from Partial Examples

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Learning Models for Aligning Protein Sequences with Predicted Secondary Structure

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Support vector training of protein alignment models

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Learning to align: a statistical approach

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Inverse sequence alignment from partial examples

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. While some choices for substitution scores are now common, largely due to convention, there is no standard for choosing gap penalties. An objective way to resolve this question is to learn the appropriate values by solving the Inverse String Alignment Problem: given examples of correct alignments, find parameter values that make the examples be optimal-scoring alignments of their strings. We present a new polynomial-time algorithm for Inverse String Alignment that is simple to implement, fast in practice, and for the first time can learn hundreds of parameters simultaneously. The approach is also flexible: minor modifications allow us to solve inverse unique alignment (find parameter values that make the examples be the unique optimal alignments of their strings), and inverse near-optimal alignment (find parameter values that make the example alignments be as close to optimal as possible). Computational results with an implementation for global alignment show that, for the first time, we can find best-possible values for all 212 parameters of the standard protein-sequence scoring-model from hundreds of alignments in a few minutes of computation.