The distribution of subword counts is usually normal
European Journal of Combinatorics
A unified approach to word occurrence probabilities
Discrete Applied Mathematics - Special volume on combinatorial molecular biology
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Theoretical Computer Science
On the Approximate Pattern Occurrences in a Text
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Expectation of Strings with Mismatches under Markov Chain Distribution
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Theoretical Computer Science
Large deviation properties for patterns
Journal of Discrete Algorithms
Hi-index | 0.00 |
Various criteria have been defined to evaluate the significance of sets of words, the computation of them often being difficult. We provide explicit expressions for the waiting time in such a context. In order to assess the significance of a cluster of potential binding sites, we extend them to the co-occurrence problem. We point out that these criteria values depend on a few fundamental parameters. We provide efficient algorithms to compute them, that rely on a combinatorial interpretation of the formulae. We show that our results are very tight in the so-called twilight zone and improve on previous rough approximations. One assumes that the text is generated according to a Markov stationary process. These results are developed for an extended model of consensus.