PageChaser: A Tool for the Automatic Correction of Broken Web Links

  • Authors:
  • Atsuyuki Morishima;Akiyoshi Nakamizo;Toshinari Iida;Shigeo Sugimoto;Hiroyuki Kitagawa

  • Affiliations:
  • University of Tsukuba, Tsukuba, Ibaraki, Japan. mori@slis.tsukuba.ac.jp;Shibaura Institute of Technology, Tokyo, Japan/ Hitachi, Ltd.;University of Tsukuba, Tsukuba, Ibaraki, Japan/ Hitachi, Ltd.;University of Tsukuba, Tsukuba, Ibaraki, Japan. sugimoto@slis.tsukuba.ac.jp;University of Tsukuba, Tsukuba, Ibaraki, Japan. kitagawa@cs.tsukuba.ac.jp

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

PageChaser is a system that monitors links between Web pages and searches for the new locations of moved Web pages when it finds broken links. The problem of searching for moved pages is different from typical information retrieval problems. First, it is impossible to identify the final destination until the page is actually moved, so the index-server approach is not necessarily effective. Secondly, there is a large bias about where the new address is likely to be and crawler-based solutions can be effectively implemented, avoiding the need to search the entire Web. PageChaser incorporates a comprehensive set of heuristics, some of which are novel, in a single unified framework. This paper explains the underlying ideas behind the design and development of PageChaser.