On the Distribution of the Number of Missing Words in Random Texts

  • Authors:
  • Sven Rahmann;Eric Rivals

  • Affiliations:
  • Department of Computational Molecular Biology, Max-Planck-Institut für Molekulare Genetik, Ihnestraße 63-73, D-14195 Berlin, Germany Sven.Rahmann@molgen.mpg.de;L.I.R.M.M., CNRS U.M.R. 5506, 161 rue Ada, F-34392 Montpellier Cedex 5, France rivals@lirmm.fr

  • Venue:
  • Combinatorics, Probability and Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size σ and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n+q−1 over the given alphabet? For q=1, X is the number Y of empty urns with σ urns and n balls. For q⩾2, X is related to the number Y of empty urns with σq urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.