Not as easy as it seems: automating the construction of lexical chains using Roget's thesaurus

  • Authors:
  • Mario Jarmasz;Stan Szpakowicz

  • Affiliations:
  • School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada;School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada

  • Venue:
  • AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Morris and Hirst [10] present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text summarization. The first lexical chains were constructed manually using Roget's International Thesaurus. Morris and Hirst wrote that automation would be straightforward given an electronic thesaurus. All applications so far have used WordNet to produce lexical chains, perhaps because adequate electronic versions of Roget's were not available until recently. We discuss the building of lexical chains using an electronic version of Roget's Thesaurus. We implement a variant of the original algorithm, and explain the necessary design decisions. We include a comparison with other implementations.