When printed hypertexts go digital: information extraction from the parsing of indices

  • Authors:
  • Matteo Romanello;Monica Berti;Alison Babeu;Gregory Crane

  • Affiliations:
  • The Perseus Project - Tufts University, Medford, MA, USA;The Perseus Project - Tufts University, Medford, MA, USA;The Perseus Project - Tufts University, Medford, MA, USA;The Perseus Project - Tufts University, Medford, MA, USA

  • Venue:
  • Proceedings of the 20th ACM conference on Hypertext and hypermedia
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain specific language, the paper presents a parsing-based approach to the problem of extracting information from them to support the creation of a collection of fragmentary texts. This paper first considers the characteristics and structure of quotation indices and their importance when dealing with fragmentary texts. It then presents the results of applying a fuzzy parser to the OCR transcription of an index of quotations to extract information from potentially noisy input.