Scraping the ACM Digital Library

  • Authors:
  • Donna Bergmark;Paradee Phempoonpanich;Shumin Zhao

  • Affiliations:
  • Cornell Digital Library Research Group, Cornell University, Ithaca, NY;Computer Science Dept., Cornell University, Ithaca NY;Computer Science Dept., Cornell University, Ithaca NY

  • Venue:
  • ACM SIGIR Forum
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

As part of a larger project to automatically reference link the online scholarly literature, an attempt to analyze PDF documents was undertaken. The ACM Digital Library was used as the corpus for these experiments. With the current PDF and HTML analysis tools, roughly 80% accuracy was obtained in the automatic extraction of reference linking information.