Expectation of Strings with Mismatches under Markov Chain Distribution

Authors:
Cinzia Pizzi;Mauro Bianco
Affiliations:
Dipartimento di Ingegneria dell' Informazione, Università di Padova, Italy;Department of Computer Science, Texas A&M University, USA
Venue:
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Year:
2009

Citing 2
Cited 1

Motif discovery by monotone scores

Discrete Applied Mathematics
Assessing the significance of sets of words

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Faster variance computation for patterns with gaps

MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a problem related to the extraction of over-represented words from a given source text x , of length n . The words are allowed to occur with k mismatches, and x is produced by a source over an alphabet Σ according to a Markov chain of order p . We propose an online algorithm to compute the expected number of occurrences of a word y of length m in O (mk |Σ| p + 1). We also propose an offline algorithm to compute the probability of any word that occurs in the text in O (k |Σ|2) after O (nk |Σ| p + 1) pre-processing. This algorithm allows us to compute the expectation for all the words in a text of length n in O (kn 2|Σ|2 + nk |Σ| p + 1), rather than in O (n 3 |Σ| p + 1) that can be obtained with other methods. Although this study was motivated by the motif discovery problem in bioinformatics, the results find their applications in any other domain involving combinatorics on words.