Measuring topic homogeneity and its application to dictionary-based word sense disambiguation

  • Authors:
  • Ann Gledson;John Keane

  • Affiliations:
  • University of Manchester, Manchester, UK;University of Manchester, Manchester, UK

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of topical features is abundant in Natural Language Processing (NLP), a major example being in dictionary-based Word Sense Disambiguation (WSD). Yet previous research does not attempt to measure the level of topic cohesion in documents, despite assertions of its effects. This paper introduces a quantitative measure of Topic Homogeneity using a range of NLP resources and not requiring prior knowledge of correct senses. Evaluation is performed firstly by using the WordNet::Domains package to create word-sets with varying levels of homogeneity and comparing our results with those expected. Additionally, to evaluate each measure's potential value, the homogeneity results are correlated against those of 3 co-occurrence/dictionary-based WSD techniques, tested on 1040 Semcor and SENSEVAL sub-documents. Many low-moderate correlations are found to exist with several in the moderate range (above .40). These correlations surpass polysemy and senseentropy, the 2 most cited factors affecting WSD. Finally, a combined homogeneity measure achieves correlations of up to .52.