Learning Information Extraction Patterns from Tabular Web Pages without Manual Labelling

Authors:
Xiaoying Gao;Mengjie Zhang;Peter Andreae
Affiliations:
-;-;-
Venue:
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Year:
2003

Citing 0
Cited 1

Efficient Wrapper Reinduction from Dynamic Web Sources

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a domain independent approach to automatically constructing information extraction patterns for semi-structured web pages. The approach was tested onthree corpora containing a series of tabular web sites from different domains and achieved a success rate of at least 80%. A signi.cant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.