Morphological tagging: data vs. dictionaries

  • Authors:
  • Jan Hajič

  • Affiliations:
  • Johns Hopkins University, Baltimore, MD

  • Venue:
  • NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Part of Speech tagging for English seems to have reached the the human levels of error, but full morphological tagging for inflectionally rich languages, such as Romanian, Czech, or Hungarian, is still an open problem, and the results are far from being satisfactory. This paper presents results obtained by using a universalized exponential feature-based model for five such languages. It focuses on the data sparseness issue, which is especially severe for such languages (the more so that there are no extensive annotated data for those languages). In conclusion, we argue strongly that the use of an independent morphological dictionary is the preferred choice to more annotated data under such circumstances.