Learning Significant Alignments: An Alternative to Normalized Local Alignment

Authors:
Eric Breimer;Mark Goldberg
Affiliations:
-;-
Venue:
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Year:
2002

Citing 7
Cited 1

The Normalized String Editing Problem Revisited

IEEE Transactions on Pattern Analysis and Machine Intelligence
Human and mouse gene structure: comparative analysis and application to exon prediction

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A new approach to sequence comparison: normalized sequence alignment

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Fast Computation of Normalized Edit Distances

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parametric Recomuting in Alignment Graphs

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
The Conserved Exon Method for Gene Finding

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
An Efficient Uniform-Cost Normalized Edit Distance Algorithm

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware

Learning to align: a statistical approach

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a supervised learning approach to resolve difficulties in finding biologically significant local alignments. It was noticed that the O(n2) algorithm by Smith-Waterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring shorter, biologically significant ones. Arslan et. al. proposed an O(n2 log n) algorithm which outputs a normalized local alignment that maximizes the degree of similarity rather than the total similarity score. Given a properly selected normalization parameter, the algorithm can discover significant alignments that would be missed by the Smith-Waterman algorithm. Unfortunately, determining a proper normalization parameter requires repeated executions with different parameter values and expert feedback to determine the usefulness of the alignments. We propose a learning approach that uses existing biologically significant alignments to learn parameters for intelligently processing sub-optimal Smith-Waterman alignments. Our algorithm runs in O(n2) time and can discover biologically significant alignments without requiring expert feedback to produce meaningful results.