Automatic repairing of web wrappers

  • Authors:
  • Boris Chidlovskii

  • Affiliations:
  • Xerox Research Centre Europe, Grenoble Laboratory, Meylan, France

  • Venue:
  • Proceedings of the 3rd international workshop on Web information and data management
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of automatic repairing of wrappers for Web information providers. Majority of Web wrappers use "hooks'' or "landmarks'' to find and extract relevant information from Web pages and such wrappers often become inoperable when the page structure is changed. The solution we propose in this paper extends conventional forward wrappers with alternative classifiers built using content features of extracted information and wrappers processing pages backward. We report some preliminary results of the information extraction recovery and wrapper repairing for a set of real Web provider changes.