Counting lumps in word space: density as a measure of corpus homogeneity

  • Authors:
  • Magnus Sahlgren;Jussi Karlgren

  • Affiliations:
  • SICS, Swedish Institute of Computer Science, Kista, Sweden;SICS, Swedish Institute of Computer Science, Kista, Sweden

  • Venue:
  • SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a measure of corpus homogeneity that indicates the amount of topical dispersion in a corpus. The measure is based on the density of neighborhoods in semantic word spaces. We evaluate the measure by comparing the results for five different corpora. Our initial results indicate that the proposed density measure can indeed identify differences in topical dispersion.