Adapting Searchy to extract data using evolved wrappers

  • Authors:
  • David F. Barrero;María D. R-Moreno;David Camacho

  • Affiliations:
  • Universidad de Alcalá, Computer Engineering Department, Escuela Politécnica Ctra, Madrid-Barcelona km 31,600, Alcal de Henares, 28871 Madrid, Spain;Universidad de Alcalá, Computer Engineering Department, Escuela Politécnica Ctra, Madrid-Barcelona km 31,600, Alcal de Henares, 28871 Madrid, Spain;Universidad Autónma de Madrid, Escuela Politécnica Superior, C/Francisco Tomás y Valiente 11, Ciudad Universitaria de Cantoblanco, 28049 Madrid, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Organizations need diverse information systems to deal with the increasing requirements in information storage and processing, yielding the creation of information islands and therefore an intrinsic difficulty to obtain a global view. Being able to provide such an unified view of the -likely heterogeneous-information available in an organization is a goal that provides added-value to the information systems and has been subject of intense research. In this paper we present an extension of a solution named Searchy, an agent-based mediator system specialized in data extraction and Integration. Through the use of a set of wrappers, it integrates information from arbitrary sources and semantically translates them according to a mediated scheme. Searchy is actually a domain-independent wrapper container that ease wrapper development, providing, for example, semantic mapping. The extension of Searchy proposed in this paper introduces an evolutionary wrapper that is able to evolve wrappers using regular expressions. To achieve this, a Genetic Algorithm (GA) is used to learn a regex able to extract a set of positive samples while rejects a set of negative samples.