Automatic annotation of bibliographical references in digital humanities books, articles and blogs

Authors:
Young-Min Kim;Patrice Bellot;Elodie Faath;Marin Dacos
Affiliations:
University of Avignon, Avignon, France;University of Avignon, Avignon, France;CLEO, Centre for Open Electronic Publishing, Marseille, France;CLEO, Centre for Open Electronic Publishing, Marseille, France
Venue:
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Year:
2011

Citing 4
Cited 2

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A hierarchical naive Bayes mixture model for name disambiguation in author citations

Proceedings of the 2005 ACM symposium on Applied computing
Information extraction from research papers using conditional random fields

Information Processing and Management: an International Journal
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)

BooksOnline'11: 4th workshop on online books, complementary social media, and crowdsourcing

Proceedings of the 20th ACM international conference on Information and knowledge management
Report on BooksOnline'11: 4th workshop on online books, complementary social media, and crowdsourcing

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we deal with the problem of extracting and processing useful information from bibliographic references in Digital Humanities (DH) data. A machine learning technique for sequential data analysis, Conditional Random Field is applied to a corpus extracted from OpenEdition site, a web platform for journals and book collections in the humanities and social sciences. We present our ongoing project with this purpose that includes the construction of a proper corpus and a efficient CRF model on this as a preliminary. This project is supported by Google Grant for Digital Humanities. A number of experiments are conducted to find one of the best settings for a CRF model on the corpus, and we verify them both in an automatic and manual way of evaluation.