The Normalized String Editing Problem Revisited
IEEE Transactions on Pattern Analysis and Machine Intelligence
Human and mouse gene structure: comparative analysis and application to exon prediction
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A new approach to sequence comparison: normalized sequence alignment
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Fast Computation of Normalized Edit Distances
IEEE Transactions on Pattern Analysis and Machine Intelligence
Parametric Recomuting in Alignment Graphs
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
The Conserved Exon Method for Gene Finding
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
An Efficient Uniform-Cost Normalized Edit Distance Algorithm
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Learning to align: a statistical approach
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Hi-index | 0.00 |
We describe a supervised learning approach to resolve difficulties in finding biologically significant local alignments. It was noticed that the O(n2) algorithm by Smith-Waterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring shorter, biologically significant ones. Arslan et. al. proposed an O(n2 log n) algorithm which outputs a normalized local alignment that maximizes the degree of similarity rather than the total similarity score. Given a properly selected normalization parameter, the algorithm can discover significant alignments that would be missed by the Smith-Waterman algorithm. Unfortunately, determining a proper normalization parameter requires repeated executions with different parameter values and expert feedback to determine the usefulness of the alignments. We propose a learning approach that uses existing biologically significant alignments to learn parameters for intelligently processing sub-optimal Smith-Waterman alignments. Our algorithm runs in O(n2) time and can discover biologically significant alignments without requiring expert feedback to produce meaningful results.