Automatic annotation of bibliographical references in digital humanities books, articles and blogs

  • Authors:
  • Young-Min Kim;Patrice Bellot;Elodie Faath;Marin Dacos

  • Affiliations:
  • University of Avignon, Avignon, France;University of Avignon, Avignon, France;CLEO, Centre for Open Electronic Publishing, Marseille, France;CLEO, Centre for Open Electronic Publishing, Marseille, France

  • Venue:
  • Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we deal with the problem of extracting and processing useful information from bibliographic references in Digital Humanities (DH) data. A machine learning technique for sequential data analysis, Conditional Random Field is applied to a corpus extracted from OpenEdition site, a web platform for journals and book collections in the humanities and social sciences. We present our ongoing project with this purpose that includes the construction of a proper corpus and a efficient CRF model on this as a preliminary. This project is supported by Google Grant for Digital Humanities. A number of experiments are conducted to find one of the best settings for a CRF model on the corpus, and we verify them both in an automatic and manual way of evaluation.