ITPilot: A Toolkit for Industrial-Strength Web Data Extraction

  • Authors:
  • Alberto Pan;Juan Raposo;Manuel Alvarez;Paula Montoto;Jose Losada;Justo Hidalgo

  • Affiliations:
  • University of A Coruña;University of A Coruña;University of A Coruña;University of A Coruña;Denodo Technologies Inc.;Denodo Technologies Inc.

  • Venue:
  • WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of todayýs Web sources are "human-readable" but not "machine-readable", these systems must address a number of difficult challenges, such as dealing with complex navigation sequences, extracting data from HTML pages and reacting to source changes. Denodo Corporation has developed ITPilot, an industrial-strength solution that allows complex "wrappers" for Web sources to be graphically generated and automatically maintained. This paper presents the architecture and the basic ideas "behind the scenes" in ITPilot.