Information Extraction from HTML: Combining XML and Standard Techniques for IE from the Web

  • Authors:
  • Luo Xiao;Dieter Wissmann;Michael Brown;Stefan Jablonski

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes Information Extraction for applications concerning the automated filling of templates from an input of HTML documents. We developed a complete system to extract information from Web sites. The system is able to use a number of algorithms to learn the document structure, rules and keywords to locate specific information and spatial relations between different information items. Experiments with well known data set show a substantial performance improvement over standard wrapper systems.