A time-efficient, linear-space local similarity algorithm
Advances in Applied Mathematics
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds
Journal of Computer and System Sciences
On spaced seeds for similarity search
Discrete Applied Mathematics
Optimal spaced seeds for hidden Markov models, with application to homologous coding regions
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Optimal spaced seeds for faster approximate string matching
Journal of Computer and System Sciences
Biosequence Similarity Search on the Mercury System
Journal of VLSI Signal Processing Systems
Graph connectivity, partial words, and a theorem of Fine and Wilf
Information and Computation
Hardness of optimal spaced seed design
Journal of Computer and System Sciences
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Protein similarity search with subset seeds on a dedicated reconfigurable hardware
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Combinatorics on partial word correlations
Journal of Combinatorial Theory Series A
Optimal probing patterns for sequencing by hybridization
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Rapid homology search with two-stage extension and daughter seeds
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
On-line viterbi algorithm for analysis of long biological sequences
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or ''seed'' of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging. This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice.