Tuples extraction from HTML using logic wrappers and inductive logic programming
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Logic wrappers and XSLT transformations for tuples extraction from HTML
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Hi-index | 0.00 |
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The methodology is applied to information extraction from real-world HTML page sets that represent product information sheets, an important task in product data integration. The methodology addresses the problems of defining information extraction rules in the form of logic wrappers and mapping the task of learning these rules to general purpose first-order inductive learning.