Harvesting needed to maintain scientific literature online

  • Authors:
  • Nikolay Nikolov;Peter Stoehr

  • Affiliations:
  • European Bioinformatics Institute, Cambridge, United Kingdom;European Bioinformatics Institute, Cambridge, United Kingdom

  • Venue:
  • Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Millions of scientific articles are accessible freely on the web. While some of them are stored in institutional repositories many are made available on personal pages which are exposed to the net's transience. We found that nearly 11% of URLs of PDF documents containing references to life science publications were not accessible within 5 months after being harvested using a search engine's (SE) API. For most of them (8.4%) no SE cache backup could be found. Although we have yet to estimate the exact rate at which the scientific literature disappears and the duration of its disappearance the results so far are a clear indicator that web harvesting is needed to preserve the online scientific literature.