Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The open archives initiative: building a low-barrier interoperability framework
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
The decay and failures of web references
Communications of the ACM
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Notes from the Interoperability Front: A Progress Report on the Open Archives Initiative
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
The OAI-PMH static repository and static repository gateway
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Robust Hyperlinks Cost Just Five Words Each
Robust Hyperlinks Cost Just Five Words Each
Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Managing distributed collections: evaluating web page changes, movement, and replacement
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Analysis of lexical signatures for improving information persistence on the World Wide Web
ACM Transactions on Information Systems (TOIS)
The LOCKSS peer-to-peer digital preservation system
ACM Transactions on Computer Systems (TOCS)
Shuffling a stacked deck: the case for partially randomized ranking of search engine results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Factors affecting website reconstruction from the web infrastructure
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Usage analysis of a public website reconstruction tool
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Revisiting Lexical Signatures to (Re-)Discover Web Pages
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
A comparison of techniques for estimating IDF values to generate lexical signatures for the web
Proceedings of the 10th ACM workshop on Web information and data management
Finding what is missing from a digital library: A case study in the Computer Science field
Information Processing and Management: an International Journal
Correlation of Term Count and Document Frequency for Google N-Grams
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Retrieving broken web links using an approach based on contextual information
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Evaluating methods to rediscover missing web pages from the web infrastructure
Proceedings of the 10th annual joint conference on Digital libraries
DSNotify - A solution for event detection and link maintenance in dynamic datasets
Web Semantics: Science, Services and Agents on the World Wide Web
Analyzing information retrieval methods to recover broken web links
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Updating broken web links: An automatic recommendation system
Information Processing and Management: an International Journal
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Hi-index | 0.00 |
We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of "in vivo" preservation: harnessing the collective behavior of web archives, commercial search engines, and research projects for the purpose of preservation. Opal servers learn from their experiences and are able to share their knowledge with other Opal servers by mutual harvesting using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Using cached copies that can be found on the web, Opal creates lexical signatures which are then used to search for similar versions of the web page. We present the architecture of the Opal framework, discuss a reference implementation of the framework, and present a quantitative analysis of the framework that indicates that Opal could be effectively deployed.