Word familiarity distributions to understand heaps' law of vocabulary growth of the internet forums

  • Authors:
  • Masao Kubo;Hiroshi Sato;Takashi Matsubara

  • Affiliations:
  • National Defense Academy of Japan, kanagawa, Japan;National Defense Academy of Japan, kanagawa, Japan;National Defense Academy of Japan, kanagawa, Japan

  • Venue:
  • KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part III
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this study, lexical analysis is applied to the log data of conversations on Internet forums. It is well known that many regularities in documents have been found, for example, Zipf's law and Heaps' law. This type of analysis has been applied to documents in various media. However, few studies apply this analysis to documents that have been developed by many authors, for example, the log data of conversations on Internet forums. Usually, the relationship between document size and these regularities is not important, because the size of such documents is determined by its author, which is normally only a single person. However, the size of the communication log of an Internet forum is an emergent property for people who are interested in the forum. We believe that it is important to understand the dynamics of conversations. Owing to the investigation in this study, the following trend has been found: the number of posted messages is small if the vocabulary growth parameter β of Heaps' law is not within preferred range. Additionally, this study propose a new explanation based on the multiple author environment to understand the differences of this parameter β. Traditionally, such documents written by more than 1 person, for example, web sites and programming language, are analyzed from the single author point of view. This traditional approach is very important but not sufficient because this approach cannot discuss differences of vocabulary of each of the authors.