Counting lumps in word space: density as a measure of corpus homogeneity

Authors:
Magnus Sahlgren;Jussi Karlgren
Affiliations:
SICS, Swedish Institute of Computer Science, Kista, Sweden;SICS, Swedish Institute of Computer Science, Kista, Sweden
Venue:
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Year:
2005

Citing 0
Cited 1

Terminology mining in social media

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a measure of corpus homogeneity that indicates the amount of topical dispersion in a corpus. The measure is based on the density of neighborhoods in semantic word spaces. We evaluate the measure by comparing the results for five different corpora. Our initial results indicate that the proposed density measure can indeed identify differences in topical dispersion.