DEByE - Date extraction by example

  • Authors:
  • Alberto H. F. Laender;Berthier Ribeiro-Neto;Altigran S. da Silva

  • Affiliations:
  • Department of Computer Science, Federal University of Minas Gerais, ICEx-UFMG, Caixa Postal 702, Belo Horizonte MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, ICEx-UFMG, Caixa Postal 702, Belo Horizonte MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, ICEx-UFMG, Caixa Postal 702, Belo Horizonte MG, Brazil

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present DEByE(Data Extraction By Example), an approach to extracting data from Web sources, based on a small set of examples specified by the user. The novelty is in the fact that the user specifies examples according to a structure of his liking and that this structure is described at example specification time. For the specification of the examples, the user interacts with a tool we developed which adopts nested tables as its visual paradigm. Nested tables are simple, intuitive, and allow shielding the user from technical details (such as HTML tags, formatting operators, and learning automata) related to the extraction problem. The examples provided by the user are then used to generate patterns which allow extracting data from new documents. For the extraction, DEByE adopts a new bottom-up procedure we proposed which is very effective with various Web sources, as demonstrated by our experiments.