Efficient string matching: an aid to bibliographic search
Communications of the ACM
Provably sensitive Indexing strategies for biosequence similarity search
Proceedings of the sixth annual international conference on Computational biology
Significance Of inter-species matches when evolutionary rate varies
Proceedings of the sixth annual international conference on Computational biology
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Search algorithms for biosequences using random projection
Search algorithms for biosequences using random projection
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Optimizing Multiple Seeds for Protein Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Vector seeds: An extension to spaced seeds
Journal of Computer and System Sciences - Special issue on bioinformatics II
Superiority and complexity of the spaced seeds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On the complexity of the spaced seeds
Journal of Computer and System Sciences
Superiority of Spaced Seeds for Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Amino Acid Classification and Hash Seeds for Homology Search
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
On Subset Seeds for Protein Alignment
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design
Information Processing Letters
GAME: A simple and efficient whole genome alignment method using maximal exact match filtering
Computational Biology and Chemistry
Optimal spaced seeds for hidden Markov models, with application to homologous coding regions
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Quality of algorithms for sequence comparison
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Optimal spaced seeds for faster approximate string matching
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
A unifying framework for seed sensitivity and its application to subset seeds
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
NcRNA homology search using Hamming distance seeds
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Seed design framework for mapping SOLiD reads
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Toward a phylogenetically aware algorithm for fast DNA similarity search
RCG'04 Proceedings of the 2004 RECOMB international conference on Comparative Genomics
Designing Filters for Fast-Known NcRNA Identification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Better Filtering with Gapped q-Grams
Fundamenta Informaticae - Computing Patterns in Strings
Alignment seeding strategies using contiguous pyrimidine purine matches
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Fast computation of good multiple spaced seeds
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Probabilistic Arithmetic Automata and Their Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons e.ciently, BLAST [3, 2] and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or "seed" of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging. This work addresses problems arising in seed design. We give the fastest known algorithm for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, as well as theoretical results on which seeds are good choices. We also describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice.