A gold standard corpus of early modern German

  • Authors:
  • Silke Scheible;Richard J. Whitt;Martin Durrell;Paul Bennett

  • Affiliations:
  • University of Manchester;University of Manchester;University of Manchester;University of Manchester

  • Venue:
  • LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an annotated gold standard sample corpus of Early Modern German containing over 50,000 tokens of text manually annotated with POS tags, lemmas, and normalised spelling variants. The corpus is the first resource of its kind for this variant of German, and represents an ideal test bed for evaluating and adapting existing NLP tools on historical data. We describe the corpus format, annotation levels, and challenges, providing an example of the requirements and needs of smaller humanities-based corpus projects.