On the closest string and substring problems
Journal of the ACM (JACM)
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds
Journal of Computer and System Sciences
On spaced seeds for similarity search
Discrete Applied Mathematics
Estimating Seed Sensitivity on Homogeneous Alignments
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Efficient Methods for Generating Optimal Single and Multiple Spaced Seeds
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Good spaced seeds for homology search
Bioinformatics
Hardness of optimal spaced seed design
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Hardness of optimal spaced seed design
Journal of Computer and System Sciences
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Amino Acid Classification and Hash Seeds for Homology Search
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design
Information Processing Letters
New algorithms for the spaced seeds
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Protein similarity search with subset seeds on a dedicated reconfigurable hardware
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Procrastination leads to efficient filtration for local multiple alignment
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Fast computation of good multiple spaced seeds
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Probabilistic Arithmetic Automata and Their Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Optimal spaced seeds were introduced by the theoretical computer science community to bioinformatics to effectively increase homology search sensitivity. They are now serving thousands of homology search queries daily. While dozens of papers have been published on optimal spaced seeds since their invention, many fundamental questions still remain unanswered. In this paper, we settle several open questions in this area. Specifically, we prove that when the length of a non-uniformly spaced seed is bounded by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed in both (i) the average number of non-overlapping hits and (ii) the asymptotic hit probability. Then, we study the computation of the hit probability of a spaced seed, solving three more open questions: (iii) hit probability computation in a uniform homologous region is NP-hard and (iv) it admits a PTAS; (v) the asymptotic hit probability is computable in exponential time in seed length, independent of the homologous region length.