Measuring topic homogeneity and its application to dictionary-based word sense disambiguation

Authors:
Ann Gledson;John Keane
Affiliations:
University of Manchester, Manchester, UK;University of Manchester, Manchester, UK
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 18
Cited 1

An Iterative Approach to Word Sense Disambiguation

Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
The role of domain information in Word Sense Disambiguation

Natural Language Engineering
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A method for word sense disambiguation of unrestricted text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
Novel association measures using web search with double checking

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Googleology is Bad Science

Computational Linguistics
Using lexical chains for keyword extraction

Information Processing and Management: an International Journal
Unsupervised acquisition of predominant word senses

Computational Linguistics
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of topical features is abundant in Natural Language Processing (NLP), a major example being in dictionary-based Word Sense Disambiguation (WSD). Yet previous research does not attempt to measure the level of topic cohesion in documents, despite assertions of its effects. This paper introduces a quantitative measure of Topic Homogeneity using a range of NLP resources and not requiring prior knowledge of correct senses. Evaluation is performed firstly by using the WordNet::Domains package to create word-sets with varying levels of homogeneity and comparing our results with those expected. Additionally, to evaluate each measure's potential value, the homogeneity results are correlated against those of 3 co-occurrence/dictionary-based WSD techniques, tested on 1040 Semcor and SENSEVAL sub-documents. Many low-moderate correlations are found to exist with several in the moderate range (above .40). These correlations surpass polysemy and senseentropy, the 2 most cited factors affecting WSD. Finally, a combined homogeneity measure achieves correlations of up to .52.