A survey of web archive search architectures

Authors:
Miguel Costa;Daniel Gomes;Francisco Couto;Mário Silva
Affiliations:
Foundation for National Scientific Computing & University of Lisbon, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;LaSIGE, Lisbon, Portugal;IST/INESC-ID, Lisbon, Portugal
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 17
Cited 0

Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Lucene in Action (In Action series)

Lucene in Action (In Action series)
Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science)

Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Web Archiving

Web Archiving
A time machine for text search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
On the value of temporal information in information retrieval

ACM SIGIR Forum
Architecture of the internet archive

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
EverLast: a distributed architecture for preserving the web

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Modern Information Retrieval

Modern Information Retrieval
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
A survey on web archiving initiatives

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Index maintenance for time-travel text search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web archives already hold more than 282 billion documents and users demand full-text search to explore this historical information. This survey provides an overview of web archive search architectures designed for time-travel search, i.e. full-text search on the web within a user-specified time interval. Performance, scalability and ease of management are important aspects to take in consideration when choosing a system architecture. We compare these aspects and initialize the discussion of which search architecture is more suitable for a large-scale web archive.