Automatic stochastic tagging of natural language texts

  • Authors:
  • Evangelos Dermatas;George Kokkinakis

  • Affiliations:
  • University of Patras;University of Patras

  • Venue:
  • Computational Linguistics
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the eleven main grammatical classes and a large set of grammatical categories common to all languages. The unknown words are tagged using an experimentally proven stochastic hypothesis that links the stochastic behavior of the unknown words with that of the less probable known words. A fully automatic training and tagging program has been implemented on an IBM PC-compatible 80386-based computer. Measurements of error rate, time response, and memory requirements have shown that the taggers' performance is satisfactory, even though a small training text is available. The error rate is improved when new texts are used to update the stochastic model parameters.