Predicting Protein Secondary Structure Using Stochastic Tree Grammars
Machine Learning - Special issue on learning with probabilistic representations
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Probabilistic k-Testable Tree Languages
ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
ICLP '02 Proceedings of the 18th International Conference on Logic Programming
Information Extraction in Structured Documents Using Tree Automata Induction
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Knowledge Discovery from Semistructured Texts
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Logic-based web information extraction
ACM SIGMOD Record
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Title extraction from bodies of HTML documents and its application to web page retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Web data extraction based on structural similarity
Knowledge and Information Systems
Information extraction from structured documents using k-testable tree automaton inference
Data & Knowledge Engineering
Interactive learning of node selecting tree transducer
Machine Learning
Mining key information of web pages: A method and its application
Expert Systems with Applications: An International Journal
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Web page title extraction and its application
Information Processing and Management: an International Journal
Sub Node Extraction with Tree Based Wrappers
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Learning (k,l)-contextual tree languages for information extraction
ECML'05 Proceedings of the 16th European conference on Machine Learning
Integrating data from the web by machine-learning tree-pattern queries
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Learning multiplicity tree automata
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Certain and possible XPath answers
Proceedings of the 16th International Conference on Database Theory
Hi-index | 0.00 |
Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semi-structured documents (in XML or HTML) uses learning techniques based on strings. Some recent work converts the document to a ranked tree and uses tree automaton induction. This paper introduces an algorithm that uses unranked trees to induce an automaton. Experiments show that this gives the best results obtained so far for IE from semi-structured documents based on learning.