Evaluating an 'off-the-shelf' POS-tagger on early modern German text

Authors:
Silke Scheible;Richard J. Whitt;Martin Durrell;Paul Bennett
Affiliations:
Cultures University of Manchester;Cultures University of Manchester;Cultures University of Manchester;Cultures University of Manchester
Venue:
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Year:
2011

Citing 3
Cited 1

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Comparing canonicalizations of historical German text

SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
A gold standard corpus of early modern German

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop

A framework for retrieval and annotation in digital humanities using XQuery full text and update in BaseX

Proceedings of the 2012 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this study is to evaluate an 'off-the-shelf' POS-tagger for modern German on historical data from the Early Modern period (1650-1800). With no specialised tagger available for this particular stage of the language, our findings will be of particular interest to smaller, humanities-based projects wishing to add POS annotations to their historical data but which lack the means or resources to train a POS tagger themselves. Our study assesses the effects of spelling variation on the performance of the tagger, and investigates to what extent tagger performance can be improved by using 'normalised' input, where spelling variants in the corpus are standardised to a modern form. Our findings show that adding such a normalisation layer improves tagger performance considerably.