Communications of the ACM
Efficient learning of context-free grammars from positive structural examples
Information and Computation
Communications of the ACM
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Recent advances of grammatical inference
Theoretical Computer Science - Special issue on algorithmic learning theory
Inferring structure in semistructured data
ACM SIGMOD Record
Information extraction from HTML: application of a general machine learning approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Recognizing structure in Web pages using similarity queries
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
WHIRL: a word-based information representation language
Artificial Intelligence - Special issue on Intelligent internet systems
Inductive Inference: Theory and Methods
ACM Computing Surveys (CSUR)
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Monadic datalog and the expressive power of languages for web information extraction
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Discovering Structural Association of Semistructured Data
IEEE Transactions on Knowledge and Data Engineering
Learning Logical Definitions from Relations
Machine Learning
Machine Learning
Machine Learning
Probabilistic k-Testable Tree Languages
ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
ICLP '02 Proceedings of the 18th International Conference on Logic Programming
Stochastic Inference of Regular Tree Languages
ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
Information Extraction in Structured Documents Using Tree Automata Induction
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Jedi: Extracting and Synthesizing Information from the Web
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Knowledge Discovery from Semistructured Texts
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Information Extraction - Tree Alignment Approach to Pattern Discovery in Web Documents
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Looking at the Web through XML Glasses
COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Wrapper induction for information extraction
Wrapper induction for information extraction
Information extraction from web documents based on local unranked tree automaton inference
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Data & Knowledge Engineering
AMBER: turning annotations into knowledge
Proceedings of the 21st international conference companion on World Wide Web
Finding optimal probabilistic generators for XML collections
Proceedings of the 15th International Conference on Database Theory
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Hi-index | 0.00 |
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, such as HTML or XML, uses learning techniques that are based on strings, such as finite automata induction. These methods do not exploit the tree structure of the documents. A natural way to do this is to induce tree automata, which are like finite state automata but parse trees instead of strings. In this work, we explore induction of k-testable ranked tree automata from a small set of annotated examples. We describe three variants which differ in the way they generalize the inferred automaton. Experimental results on a set of benchmark data sets show that our approach compares favorably to string-based approaches. However, the quality of the extraction is still suboptimal.