The Linguistic Basis of a Rule-Based Tagger of Czech

  • Authors:
  • Karel Oliva;Milena Hnátková;Vladimir Petkevic;Pavel Kveton

  • Affiliations:
  • -;-;-;-

  • Venue:
  • TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the conception of a rule-based tagger (part-of-speech disambiguator) of Czech currently developed for tagging the Czech National Corpus (cf. [2]). The input ofthe tagger consists ofsentences whose words are assigned all possible morphological analyses. The tagger disambiguates this input by successive elimination oftags which are syntactically implausible in the sentential context ofthe particular word. Due to this, the tagger promises substantially higher accuracy than current stochastic taggers for Czech. This is documented by the results concerning the disambiguation ofthe most frequent ambiguous word form in Czech-the word se.