Efficient Wrapper Reinduction from Dynamic Web Sources
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Hi-index | 0.00 |
This paper describes a domain independent approach to automatically constructing information extraction patterns for semi-structured web pages. The approach was tested onthree corpora containing a series of tabular web sites from different domains and achieved a success rate of at least 80%. A signi.cant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.