Distributions of functional and content words differ radically

  • Authors:
  • Igor A. Bolshakov;Denis M. Filatov

  • Affiliations:
  • Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico;Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico

  • Venue:
  • MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider statistical properties of prepositions—the most numerous and important functional words in European languages. Usually, they syntactically link verbs and nouns to nouns. It is shown that their rank distributions in Russian differ radically from those of content words, being much more compact. The Zipf law distribution commonly used for content words fails for them, and thus approximations flatter at first ranks and steeper at higher ranks are applicable. For these purposes, the Mandelbrot family and an expo-logarithmic family of distributions are tested, and an insignificant difference between the two least-square approximations is revealed. It is proved that the first dozen of ranks cover more than 80% of all preposition occurrences in the DB of Russian collocations of Verb-Preposition-Noun and Noun-Preposition-Noun types, thus hardly leaving room for the rest two hundreds of available Russian prepositions.