Learning to align: a statistical approach

Authors:
Elisa Ricci;Tijl De Bie;Nello Cristianini
Affiliations:
Dept. of Electronic and Information Engineering, University of Perugia, Perugia, Italy;Dept. of Engineering Mathematics, University of Bristol, Bristol, UK;Dept. of Engineering Mathematics, University of Bristol, Bristol, UK and Dept. of Computer Science, University of Bristol, Bristol, UK
Venue:
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Year:
2007

Citing 4
Cited 1

Learning Significant Alignments: An Alternative to Normalized Local Alignment

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Setting Parameters by Example

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Inverse parametric sequence alignment

Journal of Algorithms
Simple and fast inverse alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology

A sum-over-paths extension of edit distances accounting for all sequence alignments

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new machine learning approach to the inverse parametric sequence alignment problem: given as training examples a set of correct pairwise global alignments, find the parameter values that make these alignments optimal. We consider the distribution of the scores of all incorrect alignments, then we search for those parameters for which the score of the given alignments is as far as possible from this mean, measured in number of standard deviations. This normalized distance is called the 'Z-score' in statistics. We show that the Z-score is a function of the parameters and can be computed with efficient dynamic programs similar to the Needleman-Wunsch algorithm. We also show that maximizing the Z-score boils down to a simple quadratic program. Experimental results demonstrate the effectiveness of the proposed approach.