Towards mining past content of Web pages

  • Authors:
  • A. Jatowt;K. Tanaka

  • Affiliations:
  • Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan

  • Venue:
  • The New Review of Hypermedia and Multimedia - Web Archiving
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

While much attention has recently focused on preserving the past content of the Web, there is still a lack of efficient tools for utilizing data stored in Web archives. Web archives constitute large data sources that could be extensively analysed and mined for knowledge discovery. In this paper, we describe the issues involved with mining Web archive data. We discuss several concepts related to collecting and analysing historical content of Web pages and briefly describe two knowledge discovery tasks-temporal summarization and object history detection.