Vector seeds: An extension to spaced seeds

Authors:
Broňa Brejová;Daniel G. Brown;Tomáš Vinař
Affiliations:
School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada;School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada;School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Venue:
Journal of Computer and System Sciences - Special issue on bioinformatics II
Year:
2005

Citing 2
Cited 12

Designing seeds for similarity search in genomic DNA

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
On spaced seeds for similarity search

Discrete Applied Mathematics

Optimizing Multiple Seeds for Protein Homology Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient human motion retrieval in large databases

Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia
Optimal spaced seeds for faster approximate string matching

Journal of Computer and System Sciences
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Amino Acid Classification and Hash Seeds for Homology Search

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Pairwise sequence alignment algorithms: a survey

Proceedings of the 2009 conference on Information Science, Technology and Applications
On Subset Seeds for Protein Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Subset seed automaton

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Protein similarity search with subset seeds on a dedicated reconfigurable hardware

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Rapid homology search with two-stage extension and daughter seeds

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Efficient motion search in large motion capture databases

ISVC'06 Proceedings of the Second international conference on Advances in Visual Computing - Volume Part I
Probabilistic Arithmetic Automata and Their Applications

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core regions of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences.