Search engines and their public interfaces: which apis are the most synchronized?
Proceedings of the 16th international conference on World Wide Web
Mashing up life science literature resources
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Generating citation digests for scientific publications
Proceedings of the 10th annual joint conference on Digital libraries
Hi-index | 0.00 |
Millions of scientific articles are accessible freely on the web. While some of them are stored in institutional repositories many are made available on personal pages which are exposed to the net's transience. We found that nearly 11% of URLs of PDF documents containing references to life science publications were not accessible within 5 months after being harvested using a search engine's (SE) API. For most of them (8.4%) no SE cache backup could be found. Although we have yet to estimate the exact rate at which the scientific literature disappears and the duration of its disappearance the results so far are a clear indicator that web harvesting is needed to preserve the online scientific literature.