Broad coverage automatic morphological segmentation of German words

  • Authors:
  • T. Pachunke;O. Mertineit;K. Wothke;R. Schmidt

  • Affiliations:
  • IBM Germany, Heidelberg Scientific Center, Heidelberg;IBM Germany, Heidelberg Scientific Center, Heidelberg;IBM Germany, Heidelberg Scientific Center, Heidelberg;IBM Germany, Heidelberg Scientific Center, Heidelberg

  • Venue:
  • COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

A system for the automatic segmentation of German words into morphs was developed. The main linguistic knowledge sources used by the system are a word syntax and a morph dictionary. The syntax is written in the formalism of right linear regular grammars and comprises approximately 1, 400 rules describing the set of those sequences of morph classes which underlie syntactically well formed words. The morph dictionary contains almost 11, 000 morphs. Each morph is assigned to up to 6 morph classes. - Statistical evaluations with 6000 text words showed that more than 99% of the segmented words got a correct segmentation.