Citations in the digital library of classics: extracting canonical references by using conditional random fields

  • Authors:
  • Matteo Romanello;Federico Boschetti;Gregory Crane

  • Affiliations:
  • The Perseus Project, Medford, MA;The Perseus Project, Medford, MA;The Perseus Project, Medford, MA

  • Venue:
  • NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scholars of Classics cite ancient texts by using abridged citations called canonical references. In the scholarly digital library, canonical references create a complex textile of links between ancient and modern sources reflecting the deep hypertextual nature of texts in this field. This paper aims to demonstrate the suitability of Conditional Random Fields (CRF) for extracting this particular kind of reference from unstructured texts in order to enhance the capabilities of navigating and aggregating scholarly electronic resources. In particular, we developed a parser which recognizes word level n-grams of a text as being canonical references by using a CRF model trained with both positive and negative examples.