Topical clustering of MRD senses based on information retrieval techniques

  • Authors:
  • Jen Nan Chen;Jason S. Chang

  • Affiliations:
  • National Tsing Hua University;National Tsing Hua University

  • Venue:
  • Computational Linguistics - Special issue on word sense disambiguation
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a heuristic approach capable of automatically clustering senses in a machine-readable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, if the algorithm is run on several MRDs, the clusters also provide a means of linking different senses across multiple MRDs to create an integrated lexical database. An implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described. To this end, the topical word lists and topical cross-references in the Longman Lexicon of Contemporary English (LLOCE) are used. Nearly half of the senses in the LDOCE can be linked precisely to a relevant LLOCE topic using a simple heuristic. With the definitions of senses linked to the same topic viewed as a document, topical clustering of the MRD senses bears a striking resemblance to retrieval of relevant documents for a given query in information retrieval (IR) research. Relatively well-established IR techniques of weighting terms and ranking document relevancy are applied to find the topical clusters that are most relevant to the definition of each MRD sense. Finally, we describe an implemented version of the algorithms for the LDOCE and the LLOCE and assess the performance of the proposed approach in a series of experiments and evaluations.