WetDL: a web information extraction language

  • Authors:
  • Benjamin Habegger;Mohamed Quafafou

  • Affiliations:
  • Laboratoire d'Informatique de Nantes Atlantique, Nantes, France;Institut des Applications Avances de l'Internet, Marseille, France

  • Venue:
  • ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many online information sources are available on the Web. Giving machine access to such sources leads to many interesting applications, such as using web data in mediators or software agents. Up to now most work in the field of information extraction from the web has concentrated on building wrappers, i.e. programs allowing to reformat presentational data in HTML into a more machine comprehensible format. While being an important part of a web information extraction application such wrappers are not sufficient to fully access a source. Indeed, it is necessary to setup an infrastructure allowing to build queries, fetch pages, extract specific links, etc. In this paper we propose a language called WetDL allowing to describe an information extraction task as a network of operators whose execution performs the desired extraction task.