Designing seeds for similarity search in genomic DNA

Authors:
Jeremy Buhler;Uri Keich;Yanni Sun
Affiliations:
Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA;Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853, USA;Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
Venue:
Journal of Computer and System Sciences - Special issue on bioinformatics II
Year:
2005

Citing 9
Cited 10

A time-efficient, linear-space local similarity algorithm

Advances in Applied Mathematics
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Efficient string matching: an aid to bibliographic search

Communications of the ACM
An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Motif Statistics

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Designing multiple simultaneous seeds for DNA similarity search

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds

Journal of Computer and System Sciences
On spaced seeds for similarity search

Discrete Applied Mathematics
Optimal spaced seeds for hidden Markov models, with application to homologous coding regions

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

Optimal spaced seeds for faster approximate string matching

Journal of Computer and System Sciences
Biosequence Similarity Search on the Mercury System

Journal of VLSI Signal Processing Systems
Graph connectivity, partial words, and a theorem of Fine and Wilf

Information and Computation
Hardness of optimal spaced seed design

Journal of Computer and System Sciences
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Protein similarity search with subset seeds on a dedicated reconfigurable hardware

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Combinatorics on partial word correlations

Journal of Combinatorial Theory Series A
Optimal probing patterns for sequencing by hybridization

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Rapid homology search with two-stage extension and daughter seeds

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
On-line viterbi algorithm for analysis of long biological sequences

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or ''seed'' of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging. This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice.