Information Extraction from HTML: Combining XML and Standard Techniques for IE from the Web

Authors:
Luo Xiao;Dieter Wissmann;Michael Brown;Stefan Jablonski
Affiliations:
-;-;-;-
Venue:
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Year:
2001

Citing 5
Cited 3

Information extraction as a basis for high-precision text classification

ACM Transactions on Information Systems (TOIS)
Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
A simple, fast, and effective rule learner

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems

Information Extraction from the Web: System and Techniques

Applied Intelligence
Logic wrappers and XSLT transformations for tuples extraction from HTML

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Mining travel resources on the web using l-wrappers

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes Information Extraction for applications concerning the automated filling of templates from an input of HTML documents. We developed a complete system to extract information from Web sites. The system is able to use a number of algorithms to learn the document structure, rules and keywords to locate specific information and spatial relations between different information items. Experiments with well known data set show a substantial performance improvement over standard wrapper systems.