Random texts exhibit Zipf's-law-like word frequency distribution

  • Authors:
  • W. Li

  • Affiliations:
  • Rockefeller Univ., New York, NY

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2006

Quantified Score

Hi-index 754.84

Visualization

Abstract

It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to its rank, which stretches an exponential function to a power law function