Access methods for multiversion data
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
On the semantics of “now” in databases
ACM Transactions on Database Systems (TODS)
Comparison of access methods for time-evolving data
ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
An asymptotically optimal multiversion B-tree
The VLDB Journal — The International Journal on Very Large Data Bases
Silverback: A Global-Scale Archival System
Silverback: A Global-Scale Archival System
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Answering similarity queries in peer-to-peer networks
Information Systems
Building a research library for the history of the web
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Total recall: system support for automated availability management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Efficient replica maintenance for distributed storage systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
A time machine for text search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
GridVine: An Infrastructure for Peer Information Management
IEEE Internet Computing
FluxCapacitor: efficient time-travel text search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The VLDB Journal — The International Journal on Very Large Data Bases
SafeStore: a durable and practical storage system
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Architectural Alternatives for Information Filtering in Structured Overlays
IEEE Internet Computing
Zoetrope: interacting with the ephemeral web
Proceedings of the 21st annual ACM symposium on User interface software and technology
Transaction time indexing with version compression
Proceedings of the VLDB Endowment
Flood little, cache more: effective result-reuse in P2P IR systems
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Peer-to-peer web search: euphoria, achievements, disillusionment, and future opportunities
From active data management to event-based systems and more
Temporal shingling for version identification in web archives
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Optimizing positional index structures for versioned document collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A survey of web archive search architectures
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an global-scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast, a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely-coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.