Sub Node Extraction with Tree Based Wrappers

  • Authors:
  • Stefan Raeymaekers;Maurice Bruynooghe

  • Affiliations:
  • K.U. Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001 Leuven, Belgium, email: stefan.raeymaekers@cs.kuleuven.be;K.U. Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001 Leuven, Belgium, email: maurice.bruynooghe@cs.kuleuven.be

  • Venue:
  • Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

String based as well as tree based methods have been used to learn wrappers for extraction from semi-structured documents (e.g., HTML documents). Previous work has shown that tree based approaches perform better while needing less examples than string based approaches. A disadvantage is that they can only extract complete text nodes, whereas string based approaches can extract within text nodes. This paper proposes a hybrid approach that combines the advantages of both systems and compares it experimentally with a string based approach on some sub node extraction tasks.