Automatically Generating Labeled Examples for Web Wrapper Maintenance

  • Authors:
  • Juan Raposo;Alberto Pan;Manuel Alvarez;Justo Hidalgo

  • Affiliations:
  • University of A Coruña;University of A Coruña;University of A Coruña;Denodo Technologies Inc.

  • Venue:
  • WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a "machine-readable" view over them. A significant problem of this approach is that, since web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real-world web data extraction problems.