Towards mining past content of Web pages

Authors:
A. Jatowt;K. Tanaka
Affiliations:
Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan
Venue:
The New Review of Hypermedia and Multimedia - Web Archiving
Year:
2007

Citing 12
Cited 1

Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
OCELOT: a system for summarizing Web pages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Seeing the whole in parts: text summarization for web browsing on handheld devices

Proceedings of the 10th international conference on World Wide Web
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
Austrian Online Archive Processing: Analyzing Archives of the World Wide Web

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Bursty and Hierarchical Structure in Streams

Data Mining and Knowledge Discovery
Information diffusion through blogspace

Proceedings of the 13th international conference on World Wide Web
Summarization of dynamic content in web collections

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Summarization of dynamic content in web collections

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Building a research library for the history of the web

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Evaluation of crawling policies for a web-repository crawler

Proceedings of the seventeenth conference on Hypertext and hypermedia
Lazy preservation: reconstructing websites by crawling the crawlers

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management

Detecting age of page content

Proceedings of the 9th annual ACM international workshop on Web information and data management

Quantified Score

Hi-index	0.00

Visualization

Abstract

While much attention has recently focused on preserving the past content of the Web, there is still a lack of efficient tools for utilizing data stored in Web archives. Web archives constitute large data sources that could be extensively analysed and mined for knowledge discovery. In this paper, we describe the issues involved with mining Web archive data. We discuss several concepts related to collecting and analysing historical content of Web pages and briefly describe two knowledge discovery tasks-temporal summarization and object history detection.