Superiority and complexity of the spaced seeds

Authors:
Ming Li;Bin Ma;Louxin Zhang
Affiliations:
University of Waterloo, Waterloo, Ontario, Canada;University of Western Ontario, London, Ontario, Canada;University of Singapore, Singapore
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 11
Cited 10

On the closest string and substring problems

Journal of the ACM (JACM)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Designing seeds for similarity search in genomic DNA

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Motif Statistics

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Designing multiple simultaneous seeds for DNA similarity search

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds

Journal of Computer and System Sciences
On spaced seeds for similarity search

Discrete Applied Mathematics
Estimating Seed Sensitivity on Homogeneous Alignments

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Efficient Methods for Generating Optimal Single and Multiple Spaced Seeds

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Good spaced seeds for homology search

Bioinformatics
Hardness of optimal spaced seed design

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Hardness of optimal spaced seed design

Journal of Computer and System Sciences
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Amino Acid Classification and Hash Seeds for Homology Search

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design

Information Processing Letters
New algorithms for the spaced seeds

FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Protein similarity search with subset seeds on a dedicated reconfigurable hardware

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Procrastination leads to efficient filtration for local multiple alignment

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Fast computation of good multiple spaced seeds

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Probabilistic Arithmetic Automata and Their Applications

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimal spaced seeds were introduced by the theoretical computer science community to bioinformatics to effectively increase homology search sensitivity. They are now serving thousands of homology search queries daily. While dozens of papers have been published on optimal spaced seeds since their invention, many fundamental questions still remain unanswered. In this paper, we settle several open questions in this area. Specifically, we prove that when the length of a non-uniformly spaced seed is bounded by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed in both (i) the average number of non-overlapping hits and (ii) the asymptotic hit probability. Then, we study the computation of the hit probability of a spaced seed, solving three more open questions: (iii) hit probability computation in a uniform homologous region is NP-hard and (iv) it admits a PTAS; (v) the asymptotic hit probability is computable in exponential time in seed length, independent of the homologous region length.