Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds
Journal of Computer and System Sciences
On spaced seeds for similarity search
Discrete Applied Mathematics
Efficient Methods for Generating Optimal Single and Multiple Spaced Seeds
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Good spaced seeds for homology search
Bioinformatics
A unifying framework for seed sensitivity and its application to subset seeds
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Rapid sequence homology assessment by subsampling the genome space using difference sets
IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Alignment seeding strategies using contiguous pyrimidine purine matches
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of non-overlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that when the length of a non-uniformly spaced seed is bounded above by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed of the same weight in both (i) the average number of non-overlapping hits and (ii) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds.