Mercator: A scalable, extensible Web crawler
World Wide Web
A First Experience in Archiving the French Web
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Characterizing a national community web
ACM Transactions on Internet Technology (TOIT)
Managing duplicates in a web archive
Proceedings of the 2006 ACM symposium on Applied computing
Evaluating web user perceived latency using server side measurements
Computer Communications
The Viúva Negra crawler: an experience report
Software—Practice & Experience
EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
How are web characteristics evolving?
Proceedings of the 20th ACM conference on Hypertext and hypermedia
How much of the web is archived?
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Creating a billion-scale searchable web archive
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from the web must be loaded without human intervention. This paper analyzes strategies to select contents for a national web archive and proposes a system architecture to support it.