The Linguistic Basis of a Rule-Based Tagger of Czech

Authors:
Karel Oliva;Milena Hnátková;Vladimir Petkevic;Pavel Kveton
Affiliations:
-;-;-;-
Venue:
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Year:
2000

Citing 3
Cited 3

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Probabilistic and rule-based tagger of an inflective language: a comparison

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Grammatical Agreement and Automatic Morphological Disambiguation of Inflectional Languages

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Competing Patterns for Language Engineering

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Automatic construction of a valency lexicon of czech adjectives

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the conception of a rule-based tagger (part-of-speech disambiguator) of Czech currently developed for tagging the Czech National Corpus (cf. [2]). The input ofthe tagger consists ofsentences whose words are assigned all possible morphological analyses. The tagger disambiguates this input by successive elimination oftags which are syntactically implausible in the sentential context ofthe particular word. Due to this, the tagger promises substantially higher accuracy than current stochastic taggers for Czech. This is documented by the results concerning the disambiguation ofthe most frequent ambiguous word form in Czech-the word se.