Morphological Guesser of Czech Words

  • Authors:
  • Jaroslava Hlavácová

  • Affiliations:
  • -

  • Venue:
  • TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

If a corpus is submitted to a morphological analysis, there always remain some words that the analyser could not recognize (foreign names, misspellings, ...). However, if a human reads the texts, he usually understands them, even if he does not know as many words as there are in the lexicon used by the morphological analyser. The language itself helps him to recognize unknown words. It is not only semantics or syntax but also pure morphology of unknown words that can contribute to their understanding. In this article, I describe a "guesser" that can lower the amount of unrecognized words after the "classical" morphological analysis of the Czech texts. It was tested on the Czech National Corpus.