Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene

  • Authors:
  • Tomaž Erjavec

  • Affiliations:
  • Jožef Stefan Institute, Jamova cesta, Ljubljana, Slovenia

  • Venue:
  • LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper describes a tool developed to process historical (Slovene) text, which annotates words in a TEI encoded corpus with their modern-day equivalents, morphosyntactic tags and lemmas. Such a tool is useful for developing historical corpora of highly-inflecting languages, enabling full text search in digital libraries of historical texts, for modernising such texts for today's readers and making it simpler to correct OCR transcriptions.