Kairos: proactive harvesting of research paper metadata from scientific conference web sites

  • Authors:
  • Markus Hänse;Min-Yen Kan;Achim P. Karduck

  • Affiliations:
  • Hochschule Furtwangen University;Department of Computer Science, National University of Singapore;Hochschule Furtwangen University

  • Venue:
  • ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the automatic harvesting of research paper metadata from recent scholarly events. Our system, Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled with fields of metadata that correspond to individual papers. Using event date metadata extracted from the conference website, Kairos proactively harvests metadata about the individual papers soon after they are made public. We use a Maximum Entropy classifier to classify uniform resource locators (URLs) as scientific conference websites and use Conditional Random Fields (CRF) to extract individual paper metadata from such websites. Experiments show an acceptable measure of classification accuracy of over 95% for each of the two components.