Block addressing indices for approximate text retrieval
Journal of the American Society for Information Science - Special topic issue: When museum informatics meets the World Wide Web
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Modeling hypermedia-based communication
Information Sciences: an International Journal
Proceedings of the 18th international conference on World wide web
Discovering power laws in computer programs
Information Processing and Management: an International Journal
A new statistical approach to DNS traffic anomaly detection
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Word familiarity distributions to understand heaps' law of vocabulary growth of the internet forums
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part III
Hi-index | 0.00 |
Word frequencies in text documents can be reasonably described by the Mandelbrot distribution, which has Zipf's Law as a special case. Furthermore, the growth of vocabulary size as a function of the text size (its number of words) has been described in Heaps' Law. It has been shown that these two experimental laws are related.In this paper we go a step further, and provide a (formal) derivation of Heaps' Law from the Mandelbrot distribution. We also provide a specification of the validity area for applying Heaps' Law.