Superiority of Spaced Seeds for Homology Search

Authors:
Louxin Zhang
Affiliations:
-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2007

Citing 9
Cited 2

Designing seeds for similarity search in genomic DNA

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Motif Statistics

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Better filtering with gapped q-grams

Fundamenta Informaticae - Special issue on computing patterns in strings
Designing multiple simultaneous seeds for DNA similarity search

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds

Journal of Computer and System Sciences
On spaced seeds for similarity search

Discrete Applied Mathematics
Efficient Methods for Generating Optimal Single and Multiple Spaced Seeds

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Good spaced seeds for homology search

Bioinformatics
A unifying framework for seed sensitivity and its application to subset seeds

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics

Rapid sequence homology assessment by subsampling the genome space using difference sets

IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Alignment seeding strategies using contiguous pyrimidine purine matches

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of non-overlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that when the length of a non-uniformly spaced seed is bounded above by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed of the same weight in both (i) the average number of non-overlapping hits and (ii) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds.