Comparing canonicalizations of historical German text

Authors:
Bryan Jurish
Affiliations:
Berlin-Brandenburg Academy of Sciences, Berlin, Germany
Venue:
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Year:
2010

Citing 12
Cited 3

From text to speech: the MITalk system

From text to speech: the MITalk system
Grammatical category disambiguation by statistical optimization

Computational Linguistics
Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
An algorithm to align words for historical comparison

Computational Linguistics
An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Semiring frameworks and algorithms for shortest-distance problems

Journal of Automata, Languages and Combinatorics
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Building Nutch: Open Source Search

Queue - Search Engines

A gold standard corpus of early modern German

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Evaluating an 'off-the-shelf' POS-tagger on early modern German text

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Dealing with orthographic variation in a tagger-lemmatizer for fourteenth century Dutch charters

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Historical text presents numerous challenges for contemporary natural language processing techniques. In particular, the absence of consistent orthographic conventions in historical text presents difficulties for any system requiring reference to a static lexicon accessed by orthographic form. In this paper, we present three methods for associating unknown historical word forms with synchronically active canonical cognates and evaluate their performance on an information retrieval task over a manually annotated corpus of historical German verse.