Information Extraction in Structured Documents Using Tree Automata Induction

Authors:
Raymond Kosala;Jan Van den Bussche;Maurice Bruynooghe;Hendrik Blockeel
Affiliations:
-;-;-;-
Venue:
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2002

Citing 19
Cited 12

A theory of the learnable

Communications of the ACM
Efficient learning of context-free grammars from positive structural examples

Information and Computation
Information extraction

Communications of the ACM
Recent advances of grammatical inference

Theoretical Computer Science - Special issue on algorithmic learning theory
Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Recognizing structure in Web pages using similarity queries

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Monadic datalog and the expressive power of languages for web information extraction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Wrapper Generation via Grammar Induction

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Probabilistic k-Testable Tree Languages

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Using Grammatical Inference to Automate Information Extraction from the Web

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Knowledge Discovery from Semistructured Texts

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project

A Machine Learning Approach to Rapid Development of XML Mapping Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Automatic information extraction from large websites

Journal of the ACM (JACM)
Tree-Structured Template Generation for Web Pages

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Information extraction from structured documents using k-testable tree automaton inference

Data & Knowledge Engineering
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Learning (k,l)-contextual tree languages for information extraction from web pages

Machine Learning
WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES

Applied Artificial Intelligence
Information extraction from web documents based on local unranked tree automaton inference

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning (k,l)-contextual tree languages for information extraction

ECML'05 Proceedings of the 16th European conference on Machine Learning
Tuples extraction from HTML using logic wrappers and inductive logic programming

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Mining travel resources on the web using l-wrappers

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.