Learning Scoring Schemes for Sequence Alignment from Partial Examples

Authors:
Eagu Kim;John Kececioglu
Affiliations:
University of Arizona, Tucson;University of Arizona, Tucson
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2008

Citing 15
Cited 2

The multiple sequence alignment problem in biology

SIAM Journal on Applied Mathematics
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Combinatorial optimization

Combinatorial optimization
Setting Parameters by Example

SIAM Journal on Computing
Aligning alignments exactly

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Inverse parametric sequence alignment

Journal of Algorithms
SABmark---a benchmark for sequence alignment that covers the entire known fold space

Bioinformatics
SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures

Bioinformatics
PROMALS

Bioinformatics
Multiple alignment by aligning alignments

Bioinformatics
Support vector training of protein alignment models

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Multiple sequence alignment based on profile alignment of intermediate sequences

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
CONTRAlign: discriminative training for protein sequence alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Simple and fast inverse alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Inverse sequence alignment from partial examples

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Learning Models for Aligning Protein Sequences with Predicted Secondary Structure

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Estimating the accuracy of multiple alignments and its use in parameter advising

RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%.