A reconfigurable stochastic tagger for languages with complex tag structure

  • Authors:
  • Lstrokukasz Dębowski

  • Affiliations:
  • Polish Academy of Sciences

  • Venue:
  • MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a case study of a complex stochastic disambiguator of alternatives of morphosyntactic tags which allows for using incomplete disambiguation, shorthand tag notation, external tagset definition and external definition of multivalued context features. The tagger bases on Naive Bayes modeling and allows for using almost as general context features as in classical trigram taggers as well as more specific ones. Its preliminary results for Polish still do not meet our expectations. Possible sources of the tagger's failures can be: inhomogeneity of the training corpus in preparation, lack of the automatic search of probability models, too general conditional independence assumptions in defining the class of interpretable models.