Tuples extraction from HTML using logic wrappers and inductive logic programming

  • Authors:
  • Costin Bădică;Amelia Bădică;Elvira Popescu

  • Affiliations:
  • Software Engineering Department, University of Craiova, Craiova, Romania;Business Information Systems Department, University of Craiova, Craiova, Romania;Software Engineering Department, University of Craiova, Craiova, Romania

  • Venue:
  • AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an approach for applying inductive logic programming to information extraction from HTML documents structured as unranked ordered trees. We consider information extraction from Web resources that are abstracted as providing sets of tuples. Our approach is based on defining a new class of wrappers as a special class of logic programs – logic wrappers. The approach is demonstrated with examples and experimental results in the area of collecting product information, highlighting the advantages and the limitations of the method.