Design and selection criteria for a national web archive

Authors:
Daniel Gomes;Sérgio Freitas;Mário J. Silva
Affiliations:
Faculty of Sciences, University of Lisbon, Lisboa, Portugal;Faculty of Sciences, University of Lisbon, Lisboa, Portugal;Faculty of Sciences, University of Lisbon, Lisboa, Portugal
Venue:
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Year:
2006

Citing 6
Cited 5

Mercator: A scalable, extensible Web crawler

World Wide Web
A First Experience in Archiving the French Web

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Characterizing a national community web

ACM Transactions on Internet Technology (TOIT)
Managing duplicates in a web archive

Proceedings of the 2006 ACM symposium on Applied computing
Evaluating web user perceived latency using server side measurements

Computer Communications

The Viúva Negra crawler: an experience report

Software—Practice & Experience
Digital libraries and engines of search: new information systems in the context of the digital preservation

EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
How are web characteristics evolving?

Proceedings of the 20th ACM conference on Hypertext and hypermedia
How much of the web is archived?

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Creating a billion-scale searchable web archive

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from the web must be loaded without human intervention. This paper analyzes strategies to select contents for a national web archive and proposes a system architecture to support it.