Evaluating an 'off-the-shelf' POS-tagger on early modern German text

  • Authors:
  • Silke Scheible;Richard J. Whitt;Martin Durrell;Paul Bennett

  • Affiliations:
  • Cultures University of Manchester;Cultures University of Manchester;Cultures University of Manchester;Cultures University of Manchester

  • Venue:
  • LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of this study is to evaluate an 'off-the-shelf' POS-tagger for modern German on historical data from the Early Modern period (1650-1800). With no specialised tagger available for this particular stage of the language, our findings will be of particular interest to smaller, humanities-based projects wishing to add POS annotations to their historical data but which lack the means or resources to train a POS tagger themselves. Our study assesses the effects of spelling variation on the performance of the tagger, and investigates to what extent tagger performance can be improved by using 'normalised' input, where spelling variants in the corpus are standardised to a modern form. Our findings show that adding such a normalisation layer improves tagger performance considerably.