Designing multiple simultaneous seeds for DNA similarity search

Authors:
Yanni Sun;Jeremy Buhler
Affiliations:
Washington University, St. Louis, MO;Washington University, St. Louis, MO
Venue:
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Year:
2004

Citing 7
Cited 17

Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Encyclopedia of Artificial Intelligence

Encyclopedia of Artificial Intelligence
Significance Of inter-species matches when evolutionary rate varies

Proceedings of the sixth annual international conference on Computational biology
Designing seeds for similarity search in genomic DNA

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Transforming men into mice: the Nadeau-Taylor chromosomal breakage model revisited

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
FLASH: A Fast Look-Up Algorithm for String Homology

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Optimal spaced seeds for hidden Markov models, with application to homologous coding regions

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

Optimizing Multiple Seeds for Protein Homology Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multiseed Lossless Filtration

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Designing seeds for similarity search in genomic DNA

Journal of Computer and System Sciences - Special issue on bioinformatics II
Superiority and complexity of the spaced seeds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On the complexity of the spaced seeds

Journal of Computer and System Sciences
Superiority of Spaced Seeds for Homology Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Amino Acid Classification and Hash Seeds for Homology Search

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Designing Patterns and Profiles for Faster HMM Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On Subset Seeds for Protein Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
GAME: A simple and efficient whole genome alignment method using maximal exact match filtering

Computational Biology and Chemistry
Optimal spaced seeds for faster approximate string matching

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Rapid homology search with two-stage extension and daughter seeds

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
A unifying framework for seed sensitivity and its application to subset seeds

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
NcRNA homology search using Hamming distance seeds

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Designing Filters for Fast-Known NcRNA Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Alignment seeding strategies using contiguous pyrimidine purine matches

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Fast computation of good multiple spaced seeds

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The challenge of similarity search in massive DNA sequence databases has inspired major changes in BLAST-style alignment tools, which accelerate search by inspecting only pairs of sequences sharing a common short "seed," or pattern of matching residues. Some of these changes raise the possibility of improving search performance by probing sequence pairs with several distinct seeds, any one of which is sufficient for a seed match. However, designing a set of seeds to maximize their combined sensitivity to biologically meaningful sequence alignments is computationally difficult, even given recent advances [16, 6] in designing single seeds.This work describes algorithmic improvements to seed design that address the problem of designing a set of n seeds to be used simultaneously. We give a new local search method to optimize the sensitivity of seed sets. The method relies on efficient incremental computation of the probability that an alignment contains a match to a seed π, given that it has already failed to match any of the seeds in a set π. We demonstrate experimentally that multi-seed designs, even with relatively few seeds, can be significantly more sensitive than even optimized single-seed designs.