Random texts exhibit Zipf's-law-like word frequency distribution

Authors:
W. Li
Affiliations:
Rockefeller Univ., New York, NY
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 5

Density index and proximity search in large graphs

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient fuzzy search in large text collections

ACM Transactions on Information Systems (TOIS)
Semantic stability in social tagging streams

Proceedings of the 23rd international conference on World wide web
A novel graphical representation of sentence complexity: the description and its application

Scientometrics
Enumerating maximal bicliques in bipartite graphs with favorable degree sequences

Information Processing Letters

Quantified Score

Hi-index	754.84

Visualization

Abstract

It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to its rank, which stretches an exponential function to a power law function