A Quantitative Evaluation of Dissemination-Time Preservation Metadata

Authors:
Joan A. Smith;Michael L. Nelson
Affiliations:
C.S. Dept, Old Dominion University, Norfolk VA 23529;C.S. Dept, Old Dominion University, Norfolk VA 23529
Venue:
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Year:
2008

Citing 11
Cited 0

Customized information extraction as a basis for resource discovery

ACM Transactions on Computer Systems (TOCS)
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Dynamics and Evolution of Web Sites: Analysis, Metrics and Design Issues

ISCC '01 Proceedings of the Sixth IEEE Symposium on Computers and Communications
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Characterization of a large web site population with implications for content delivery

Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages

Software—Practice & Experience - Special issue: Web technologies
Representing digital assets usingMPEG-21 Digital Item Declaration

International Journal on Digital Libraries
The portrait of a common HTML web page

Proceedings of the 2006 ACM symposium on Document engineering
Efficient, automatic web resource harvesting

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Characterization of national Web domains

ACM Transactions on Internet Technology (TOIT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of many challenges facing web preservation efforts is the lack of metadata available for web resources. In prior work, we proposed a model that takes advantage of a site's own web server to prepare its resources for preservation. When responding to a request from an archiving repository, the server applies a series of metadata utilities, such as Jhove and Exif, to the requested resource. The output from each utility is included in the HTTP response along with the resource itself. This paper addresses the question of feasibility: Is it in fact practical to use the site's web server as a just-in-time metadata generator, or does the extra processing create an unacceptable deterioration in server responsiveness to quotidian events? Our tests indicate that (a) this approach can work effectively for both the crawler and the server; and that (b) utility selection is an important factor in overall performance.