Learning Information Extraction Patterns from Tabular Web Pages without Manual Labelling

  • Authors:
  • Xiaoying Gao;Mengjie Zhang;Peter Andreae

  • Affiliations:
  • -;-;-

  • Venue:
  • WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a domain independent approach to automatically constructing information extraction patterns for semi-structured web pages. The approach was tested onthree corpora containing a series of tabular web sites from different domains and achieved a success rate of at least 80%. A signi.cant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.