A unified approach to word occurrence probabilities
Discrete Applied Mathematics - Special volume on combinatorial molecular biology
Re-describing an algorithm by Hopcroft
Theoretical Computer Science
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Theoretical Computer Science
Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)
Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)
Computing exact P-values for DNA motifs
Bioinformatics
Construction of Aho Corasick automaton in linear time for integer alphabets
Information Processing Letters
Markov additive chains and applications to fragment statistics for peptide mass fingerprinting
RECOMB'06 Proceedings of the joint 2006 satellite conference on Systems biology and computational proteomics
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Set Intersection and Sequence Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Speeding up exact motif discovery by bounding the expected clump size
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Construction of minimal deterministic finite automata from biological motifs
Theoretical Computer Science
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Probabilistic Arithmetic Automata and Their Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We present probabilistic arithmetic automata (PAAs), which can be used to model chains of operations whose operands depend on chance. We provide two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. Although we introduce PAAs and the corresponding algorithm in a generic manner, our main concern is their application to pattern matching statistics, i.e. we study the distributions of the number of occurrences of a pattern under a given text model. Such calculations play an important role in computational biology as they give access to the significance of pattern occurrences. To assess the practicability of our method, we apply it to the Prosite database of amino acid motifs and to the Jaspar database of transcription factor binding sites. Regarding the latter, we additionally show that our framework permits to take binding affinities predicted from a physical model into account.