Parsing the past: identification of verb constructions in historical text

Authors:
Eva Pettersson;Beáta Megyesi;Joakim Nivre
Affiliations:
Swedish National Graduate School of Language Technology;Uppsala University;Uppsala University
Venue:
LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Year:
2012

Citing 5
Cited 0

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Natural Language Processing Across Time: An Empirical Investigation on Italian

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
HunPos: an open source trigram tagger

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Extending the tool, or how to annotate historical language varieties

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Automatic verb extraction from historical Swedish texts

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Even though NLP tools are widely used for contemporary text today, there is a lack of tools that can handle historical documents. Such tools could greatly facilitate the work of researchers dealing with large volumes of historical texts. In this paper we propose a method for extracting verbs and their complements from historical Swedish text, using NLP tools and dictionaries developed for contemporary Swedish and a set of normalisation rules that are applied before tagging and parsing the text. When evaluated on a sample of texts from the period 1550--1880, this method identifies verbs with an F-score of 77.2% and finds a partially or completely correct set of complements for 55.6% of the verbs. Although these results are in general lower than for contemporary Swedish, they are strong enough to make the approach useful for information extraction in historical research. Moreover, the exact match rate for complete verb constructions is in fact higher for historical texts than for contemporary texts (38.7% vs. 30.8%).