Context Generalization for Information Extraction from the Web

  • Authors:
  • Benjamin Habegger;Mohamed Quafafou

  • Affiliations:
  • LINA, France;IAAI, France

  • Venue:
  • WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many online data sources, such as product catalogs, on-line directories, etc. are available on the web. Extracting information from such sources is a hard task since these sources are designed to be presented to human users. Many researchers have tackled the problem of building wrappers for such sources. The state of the art approach is to use machine learning techniques based on fully labeled example pages. In this paper we propose and study an approach based on example instances. This allows the user to build a wrapper using only a handful of examples of the whole source allowing to take into account structural differences. The patterns obtained allow to extract the instances of the relation described by the examples and contained in the same data source.