Information extraction from HTML: application of a general machine learning approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Token-Templates and Logic Programs for Intelligent Web Search
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Learning for text categorization and information extraction with ILP
Learning language in logic
Information Extraction in Structured Documents Using Tree Automata Induction
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Knowledge Discovery from Semistructured Texts
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Learning Logic Wrappers for Information Extraction from the Web
SAINT-W '05 Proceedings of the 2005 Symposium on Applications and the Internet Workshops
Adaptive information extraction: core technologies for information agents
Intelligent information agents
Logic wrappers and XSLT transformations for tuples extraction from HTML
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Mining travel resources on the web using l-wrappers
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Hi-index | 0.00 |
This paper presents an approach for applying inductive logic programming to information extraction from HTML documents structured as unranked ordered trees. We consider information extraction from Web resources that are abstracted as providing sets of tuples. Our approach is based on defining a new class of wrappers as a special class of logic programs – logic wrappers. The approach is demonstrated with examples and experimental results in the area of collecting product information, highlighting the advantages and the limitations of the method.