New algorithms for the spaced seeds

Authors:
Xin Gao;Shuai Cheng Li;Yinan Lu
Affiliations:
David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada and College of Computer Science and Tecnology of Jilin University, Changchun, Jilin Province, China
Venue:
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Year:
2007

Citing 7
Cited 0

q-gram based database searching using a suffix array (QUASAR)

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds

Journal of Computer and System Sciences
Efficient Methods for Generating Optimal Single and Multiple Spaced Seeds

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Optimizing Multiple Seeds for Protein Homology Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Good spaced seeds for homology search

Bioinformatics
Superiority and complexity of the spaced seeds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Optimal spaced seeds for hidden Markov models, with application to homologous coding regions

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

The best known algorithm computes the sensitivity of a given spaced seed on a random region with running time O((M+L)|B|), where M is the length of the seed, L is the length of the random region, and |B| is the size of seed-compatible-suffix set, which is exponential to the number of 0's in the seed. We developed two algorithms to improve this running time: the first one improves the running time to O(|B′|2ML), where B′ is a subset of B; the second one improves the running time to O((M|B|)2.236log(L/M)), which will be much smaller than the original running time when L is large. We also developed a Monte Carlo algorithm which can guarantee to quickly find a near optimal seed with high probability.