Web wrapper induction: a brief survey

  • Authors:
  • Sergio Flesca;Giuseppe Manco;Elio Masciari;Eugenio Rende;Andrea Tagarelli

  • Affiliations:
  • DEIS, University of Calabria, 87030 Rende, Italy E-mail: flesca@si.deis.unical.it (S. Flesca and E. Rende are partially supported by the EU project Infomix and by Lixto Software GmbH);ICAR-CNR, 87030 Rende, Italy E-mail: manco@icar.cnr.it;ICAR-CNR, 87030 Rende, Italy E-mail: masciari@icar.cnr.it;DEIS, University of Calabria, 87030 Rende, Italy E-mail: erende@si.deis.unical.it;DEIS, University of Calabria, 87030 Rende, Italy E-mail: {flesca,erende,tagarelli}@si.deis.unical.it

  • Venue:
  • AI Communications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays several companies use the information available on the Web for a number of purposes. However, since most of this information is only available as HTML documents, several techniques that allow information from the Web to be automatically extracted have recently been defined. In this paper we review the main techniques and tools for extracting information available on the Web, devising a taxonomy of existing systems. In particular we emphasize the advantages and drawbacks of the techniques analyzed from a user point of view.