Procrastination leads to efficient filtration for local multiple alignment

Authors:
Aaron E. Darling;Todd J. Treangen;Louxin Zhang;Carla Kuiken;Xavier Messeguer;Nicole T. Perna
Affiliations:
Department of Computer Science, University of Wisconsin;Department of Computer Science, Technical University of Catalonia, Barcelona, Spain;Department of Mathematics, National University of Singapore, Singapore;T-10 Theoretical Biology Division, Los Alamos National Laboratory;Department of Computer Science, Technical University of Catalonia, Barcelona, Spain;Department of Animal Health and Biomedical Sciences, Genome Center, University of Wisconsin
Venue:
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Year:
2006

Citing 10
Cited 2

Computation and Visualization of Degenerate Repeats in Complete Genomes

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Fast and Sensitive Alignment of Large Genomic Sequences

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Good spaced seeds for homology search

Bioinformatics
Speeding up whole-genome alignment by indexing frequency vectors

Bioinformatics
Tracking repeats using significance and transitivity

Bioinformatics
PILER: identification and classification of genomic repeats

Bioinformatics
Statistics of local multiple alignments

Bioinformatics
Computing the P-value of the information content from an alignment of multiple sequences

Bioinformatics
The Los Alamos hepatitis C sequence database

Bioinformatics
Superiority and complexity of the spaced seeds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gapped extension for local multiple alignment of interspersed DNA repeats

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simultaneously, (2) seed extension (chaining) in order of decreasing multiplicity, and (3) procrastination when low multiplicity matches are encountered. The resulting local multiple alignments may have nucleotide substitutions and internal gaps as large as w characters in any occurrence of the motif. The algorithm consumes $\mathcal{O}(wN)$ memory and $\mathcal{O}(wN \log wN)$ time where N is the sequence length. We score the significance of multiple alignments using entropy-based motif scoring methods. We demonstrate the performance of our filtration method on Alu-repeat rich segments of the human genome and a large set of Hepatitis C virus genomes. The GPL implementation of our algorithm in C++ is called procrastAligner and is freely available from http://gel.ahabs.wisc.edu/procrastination