A First Experience in Archiving the French Web

Authors:
Serge Abiteboul;Gregory Cobena;Julien Masanes;Gerald Sedrati
Affiliations:
-;-;-;-
Venue:
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Year:
2002

Citing 4
Cited 10

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Synchronizing a database to improve freshness

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Change-Centric Management of Versions in an XML Warehouse

Proceedings of the 27th International Conference on Very Large Data Bases

Issues in Monitoring Web Data

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Adaptive on-line page importance computation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Digital libraries and engines of search: new information systems in the context of the digital preservation

EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
Recovering a website's server components from the web infrastructure

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Using visual pages analysis for optimizing web archiving

Proceedings of the 2010 EDBT/ICDT Workshops
Vi-DIFF: understanding web pages changes

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Archiving the web using page changes patterns: a case study

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Incremental web-site boundary detection using random walks

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Design and selection criteria for a national web archive

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Identifying websites with flow simulation

ICWE'05 Proceedings of the 5th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The web is a more and more valuable source of information and organizations are involved in archiving (portions of) it for various purposes, e.g., the Internet Archive www.archive.org. A new mission of the French National Library (BnF) is the "d茅p么t l茅gal" (legal deposit) of the French web. We describe here some preliminary work on the topic conducted by BnF and INRIA. In particular, we consider the acquisition of the web archive. Issues are the definition of the perimeter of the French web and the choice of pages to read once or more times (to take changes into account). When several copies of the same page are kept, this leads to versioning issues that we briefly consider. Finally, we mention some first experiments.