Vector seeds: An extension to spaced seeds

  • Authors:
  • Broňa Brejová;Daniel G. Brown;Tomáš Vinař

  • Affiliations:
  • School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada;School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada;School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada

  • Venue:
  • Journal of Computer and System Sciences - Special issue on bioinformatics II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core regions of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences.