General study of the distribution of N-tuples of letters or words based on the distributions of the single letters or words

  • Authors:
  • L Egghe

  • Affiliations:
  • LUC, Universitaire Campus B-3590 Diepenbeek, BelgiumPermanent address. and UIA, Universiteitsplein 1 B-2610 Wilrijk, Belgium

  • Venue:
  • Mathematical and Computer Modelling: An International Journal
  • Year:
  • 2000

Quantified Score

Hi-index 0.98

Visualization

Abstract

This paper establishes the general relation between the distribution of N-tuples of letters (e.g., N-truncations, N-grams) or words (e.g., N-word phrases) and the distributions of the single letters or words. Here the very general case is treated: the case where there is dependence on the place i in the N-tuple (i = 1,..., N) in the sense that, for each i = 1,..., N, a different distribution of the letters or words is supposed. Concrete calculations are performed in the important case of Zipfian distributions (i.e., power laws) for the single letters or words. In this case, we prove that the distribution of the N-tuples (N-fixed) is the sum of power laws.