A Unifying Approach to HTML Wrapper Representation and Learning

  • Authors:
  • Gunter Grieser;Klaus P. Jantke;Steffen Lange;Bernd Thomas

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DS '00 Proceedings of the Third International Conference on Discovery Science
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The number, the size, and the dynamics of Internet information sources bears abundant evidence of the need for automation in information extraction. This calls for representation formalisms that match the World Wide Web reality and for learning approaches and learnability results that apply to these formalisms. The concept of elementary formal systems is appropriately generalized to allow for the representation of wrapper classes which are relevant to the description of Internet sources in HTML format. Related learning results prove that those wrappers are automatically learnable from examples. This is setting the stage for information extraction from the Internet by exploitation of inductive learning techniques.