A formal derivation of Heaps' Law

  • Authors:
  • D. C. van Leijenhorst;Th. P. van der Weide

  • Affiliations:
  • Department of Computer Science, Faculty of Mathematics and Computing Science, Radboud University of Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, Netherlands;Department of Computer Science, Faculty of Mathematics and Computing Science, Radboud University of Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, Netherlands

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word frequencies in text documents can be reasonably described by the Mandelbrot distribution, which has Zipf's Law as a special case. Furthermore, the growth of vocabulary size as a function of the text size (its number of words) has been described in Heaps' Law. It has been shown that these two experimental laws are related.In this paper we go a step further, and provide a (formal) derivation of Heaps' Law from the Mandelbrot distribution. We also provide a specification of the validity area for applying Heaps' Law.