Pattern matching statistics on correlated sources

  • Authors:
  • Jérémie Bourdon;Brigitte Vallée

  • Affiliations:
  • LINA, CNRS and Université de Nantes, France;GREYC, CNRS and Université de Caen, France

  • Venue:
  • LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In pattern matching algorithms, two characteristic parameters play an important rôle : the number of occurrences of a given pattern, and the number of positions where a pattern occurrence ends. Since there may exist many occurrences which end at the same position, these two parameters may differ in a significant way. Here, we consider a general framework where the text is produced by a probabilistic source, which can be built by a dynamical system. Such “dynamical sources” encompass the classical sources –memoryless sources, and Markov chains–, and may possess a high degree of correlations. We are mainly interested in two situations : the pattern is a general word of a regular expression, and we study the number of occurrence positions – the pattern is a finite set of strings, and we study the number of occurrences. In both cases, we determine the mean and the variance of the parameter, and prove that its distribution is asymptotically Gaussian. In this way, we extend methods and results which have been already obtained for classical sources [for instance in [9] and in [6] to this general “dynamical” framework. Our methods use various techniques: formal languages, and generating functions, as in previous works. However, in this correlated model, it is not possible to use a direct transfer into generating functions, and we mainly deal with generating operators which generate... generating functions.