EverLast: a distributed architecture for preserving the web

  • Authors:
  • Avishek Anand;Srikanta Bedathur;Klaus Berberich;Ralf Schenkel;Christos Tryfonopoulos

  • Affiliations:
  • Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany;Saarland University, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany

  • Venue:
  • Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an global-scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast, a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely-coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.