Regexpcount, a symbolic package for counting problems on regular expressions and words

Authors:
Pierre Nicodème
Affiliations:
LIX - École polytechnique, 91128 Palaiseau cedex, France
Venue:
Fundamenta Informaticae - Computing Patterns in Strings
Year:
2003

Citing 9
Cited 0

From regular expressions to deterministic automata

Theoretical Computer Science
The distribution of subword counts is usually normal

European Journal of Combinatorics
Automata and formal languages: an introduction

Automata and formal languages: an introduction
A unified approach to word statistics

RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
A unified approach to word occurrence probabilities

Discrete Applied Mathematics - Special volume on combinatorial molecular biology
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Automata and Computability

Automata and Computability
Motif statistics

Theoretical Computer Science
A Statistical Method for Finding Transcription Factor Binding Sites

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In previous work [10], we considered algorithms related to the statistics of matches with words and regular expressions in texts generated by Bernoulli or Markov sources. In this work these algorithms are extended for two purposes: to determine the statistics of simultaneous counting of different motifs, and to compute the waiting time for the first match with a motif in a model which may be constrained. This extension also handles matches with errors. The package is fully implemented and gives access to high and low level commands. We also consider an example corresponding to a practical biological problem: getting the statistics for the number of matches of words of size 8 in a genome (a Markovian sequence), knowing that an (overrepresented DNA protecting) pattern named Chi occurs a given number of times.