Grammatical category disambiguation by statistical optimization

  • Authors:
  • Steven J. DeRose

  • Affiliations:
  • Brown University, Dallas, TX

  • Venue:
  • Computational Linguistics
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several algorithms have been developed in the past that attempt to resolve categorial ambiguities in natural language text without recourse to syntactic or semantic level information. An innovative method (called "CLAWS") was recently developed by those working with the Lancaster-Oslo/Bergen Corpus of British English. This algorithm uses a systematic calculation based upon the probabilities of co-occurrence of particular tags. Its accuracy is high, but it is very slow, and it has been manually augmented in a number of ways. The effects upon accuracy of this manual augmentation are not individually known.The current paper presents an algorithm for disambiguation that is similar to CLAWS but that operates in linear rather than in exponential time and space, and which minimizes the unsystematic augments. Tests of the algorithm using the million words of the Brown Standard Corpus of English are reported; the overall accuracy is 96%. This algorithm can provide a fast and accurate front end to any parsing or natural language processing system for English.