Inverse sequence alignment from partial examples

Authors:
Eagu Kim;John Kececioglu
Affiliations:
Department of Computer Science, The University of Arizona, Tucson, AZ;Department of Computer Science, The University of Arizona, Tucson, AZ
Venue:
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Year:
2007

Citing 7
Cited 2

Combinatorial optimization

Combinatorial optimization
Setting Parameters by Example

SIAM Journal on Computing
Inverse parametric sequence alignment

Journal of Algorithms
Multiple alignment by aligning alignments

Bioinformatics
Support vector training of protein alignment models

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
CONTRAlign: discriminative training for protein sequence alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Simple and fast inverse alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology

Learning Scoring Schemes for Sequence Alignment from Partial Examples

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Learning Models for Aligning Protein Sequences with Predicted Secondary Structure

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for biological sequences is inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the example alignments score close to optimal. We extend prior work on inverse alignment to partial examples and to an improved model based on minimizing the average error of the examples. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the recovery rate for multiple sequence alignment by up to 25%.