Logic wrappers and XSLT transformations for tuples extraction from HTML

  • Authors:
  • Costin Bădică;Amelia Bădică

  • Affiliations:
  • Software Engineering Department, University of Craiova, Craiova, RO, Romania;Business Information Systems Department, University of Craiova, Craiova, RO, Romania

  • Venue:
  • XSym'05 Proceedings of the Third international conference on Database and XML Technologies
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently it was shown that existing general-purpose inductive logic programming systems are useful for learning wrappers (known as L-wrappers) to extract data from HTML documents. Here we propose a formalization of L-wrappers and their patterns, including their syntax and semantics and related properties and operations. A mapping of the patterns to a subset of XSLT that has a formal semantics is outlined and demonstrated by an example. The mapping actually shows how the theory can be applied to obtain efficient wrappers for information extraction from HTML.