An automatic method for generating sense tagged corpora

  • Authors:
  • Rada Mihalcea;Dan I. Moldovan

  • Affiliations:
  • -;-

  • Venue:
  • AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop context rules, to train and test them.This paper presents a novel approach to automatically generate arbitrarily large corpora for word senses. The method is based on (1) the information provided in WordNet, used to formulate queries consisting of synonyms or definitions of word senses, and (2) the information gathered from Internet using existing search engines. The method was tested on 120 word senses and a precision of 91% was observed.