Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

  • Authors:
  • Inke Herms;Sven Rahmann

  • Affiliations:
  • Genome Informatics, Faculty of Technology, Bielefeld University, Germany;Bioinformatics for High-Throughput Technologies, Computer Science 11, TU Dortmund, Dortmund, Germany D-44221

  • Venue:
  • WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Heuristic sequence alignment and database search algorithms, such as PatternHunter and BLAST, are based on the initial discovery of so-called alignment seedsof well-conserved alignment patterns, which are subsequently extended to full local alignments. In recent years, the theory of classical seeds (matching contiguous q-grams) has been extended to spaced seeds, which allow mismatches within a seed, and subsequently to indel seeds, which allow gaps in the underlying alignment.Different seeds within a given class of seeds are usually compared by their sensitivity, that is, the probability to match an alignment generated from a particular probabilistic alignment model.We present a flexible, exact, unifying framework called probabilistic arithmetic automatonfor seed sensitivity computation that includes all previous results on spaced and indel seeds. In addition, we can easily incorporate sets of arbitrary seeds. Instead of only computing the probability of at least one hit (the standard definition of sensitivity), we can optionally provide the entire distribution of overlapping or non-overlapping seed hits, which yields a different characterization of a seed. A symbolic representation allows fast computation for any set of parameters.