A time-efficient, linear-space local similarity algorithm
Advances in Applied Mathematics
On the power of universal bases in sequencing by hybridization
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Sequencing-by-hybridization at the information-theory bound: an optimal algorithm
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Optimizing Multiple Seeds for Protein Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Designing seeds for similarity search in genomic DNA
Journal of Computer and System Sciences - Special issue on bioinformatics II
Vector seeds: An extension to spaced seeds
Journal of Computer and System Sciences - Special issue on bioinformatics II
Superiority and complexity of the spaced seeds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On the complexity of the spaced seeds
Journal of Computer and System Sciences
Optimal spaced seeds for faster approximate string matching
Journal of Computer and System Sciences
Superiority of Spaced Seeds for Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Graph connectivity, partial words, and a theorem of Fine and Wilf
Information and Computation
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Amino Acid Classification and Hash Seeds for Homology Search
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
On Subset Seeds for Protein Alignment
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design
Information Processing Letters
Masking patterns in sequences: A new class of motif discovery with don't cares
Theoretical Computer Science
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Combinatorics on partial word correlations
Journal of Combinatorial Theory Series A
Road network reconstruction for organizing paths
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Spaced seeds design using perfect rulers
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Optimal probing patterns for sequencing by hybridization
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Optimal spaced seeds for faster approximate string matching
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Rapid homology search with two-stage extension and daughter seeds
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
A unifying framework for seed sensitivity and its application to subset seeds
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Seed design framework for mapping SOLiD reads
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Large-Scale DNA sequence analysis in the cloud: a stream-based approach
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Fast computation of good multiple spaced seeds
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Design and analysis of periodic multiple seeds
Theoretical Computer Science
Hi-index | 0.05 |
Genomics studies routinely depend on similarity searches based on the strategy of finding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeoff between search speed (larger k reduces chance hits) and sensitivity (smaller k finds weaker similarities). A novel idea of using a single deterministic optimized spaced seed was introduced in Ma et al. (Bioinformatics (2002) 18) to the above similarity search process and it was empirically demonstrated that the optimal spaced seed quadruples the search speed, without sacrificing sensitivity. Multiple, randomly spaced patterns, spaced q-grams, and spaced probes were also studied in Califano and Rigoutsos (Technical Report, IBM T.J. Watson Research Center (1995), Burkhardt, Kärkkäinen, CPM (2001), and Buhler, Bioinformatics 17 (2001) 419) and in other applications [(RECOMB (1999) 295, RECOMB (2000) 245)]. They were all found to be better than their contiguous counterparts. In this paper we study some of the theoretical and practical aspects of optimal seeds. In particular we demonstrate that the commonly used contiguous seed is in some sense the worst one, and we offer an algorithmic solution to the problem of finding the optimal seed.