A reconfigurable stochastic tagger for languages with complex tag structure

Authors:
Lstrokukasz Dębowski
Affiliations:
Polish Academy of Sciences
Venue:
MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Year:
2003

Citing 7
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A Common Solution for Tokenization and Part-of-Speech Tagging

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Stochastic HPSG

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

A flexemic tagset for Polish

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a case study of a complex stochastic disambiguator of alternatives of morphosyntactic tags which allows for using incomplete disambiguation, shorthand tag notation, external tagset definition and external definition of multivalued context features. The tagger bases on Naive Bayes modeling and allows for using almost as general context features as in classical trigram taggers as well as more specific ones. Its preliminary results for Polish still do not meet our expectations. Possible sources of the tagger's failures can be: inhomogeneity of the training corpus in preparation, lack of the automatic search of probability models, too general conditional independence assumptions in defining the class of interpretable models.