Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics

Authors:
Tobias Marschall;Sven Rahmann
Affiliations:
Bioinformatics for High-Throughput Technologies at the Chair of Algorithm Engineering, Computer Science Department, TU Dortmund, Dortmund, Germany D-44221;Bioinformatics for High-Throughput Technologies at the Chair of Algorithm Engineering, Computer Science Department, TU Dortmund, Dortmund, Germany D-44221
Venue:
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Year:
2008

Citing 9
Cited 6

A unified approach to word occurrence probabilities

Discrete Applied Mathematics - Special volume on combinatorial molecular biology
Re-describing an algorithm by Hopcroft

Theoretical Computer Science
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Motif statistics

Theoretical Computer Science
Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)

Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)
Predicting transcription factor affinities to DNA from a biophysical model

Bioinformatics
Computing exact P-values for DNA motifs

Bioinformatics
Construction of Aho Corasick automaton in linear time for integer alphabets

Information Processing Letters
Markov additive chains and applications to fragment statistics for peptide mass fingerprinting

RECOMB'06 Proceedings of the joint 2006 satellite conference on Systems biology and computational proteomics

Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Set Intersection and Sequence Matching

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Speeding up exact motif discovery by bounding the expected clump size

WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Construction of minimal deterministic finite automata from biological motifs

Theoretical Computer Science
Exact analysis of horspool's and sunday's pattern matching algorithms with probabilistic arithmetic automata

LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Probabilistic Arithmetic Automata and Their Applications

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present probabilistic arithmetic automata (PAAs), which can be used to model chains of operations whose operands depend on chance. We provide two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. Although we introduce PAAs and the corresponding algorithm in a generic manner, our main concern is their application to pattern matching statistics, i.e. we study the distributions of the number of occurrences of a pattern under a given text model. Such calculations play an important role in computational biology as they give access to the significance of pattern occurrences. To assess the practicability of our method, we apply it to the Prosite database of amino acid motifs and to the Jaspar database of transcription factor binding sites. Regarding the latter, we additionally show that our framework permits to take binding affinities predicted from a physical model into account.