Learning regular sets from queries and counterexamples
Information and Computation
On the finite degree of ambiguity of finite tree automata
Acta Informatica
Learning context-free grammars from structural data in polynomial time
Theoretical Computer Science
Random DFA's can be approximately learned from sparse uniform examples
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Characteristic Sets for Polynomial Grammatical Inference
Machine Learning
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Expressiveness of structured document query languages based on attribute grammars
Journal of the ACM (JACM)
Query automata over finite trees
Theoretical Computer Science
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Monadic Queries over Tree-Structured Data
LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Locating Matches of Tree Patterns in Forests
Proceedings of the 18th Conference on Foundations of Software Technology and Theoretical Computer Science
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Query Evaluation on Compressed Trees (Extended Abstract)
LICS '03 Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science
Wrapper induction for information extraction
Wrapper induction for information extraction
Information extraction from web documents based on local unranked tree automaton inference
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
N-ary queries by tree automata
DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
Logics for unranked trees: an overview
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Learning (k,l)-contextual tree languages for information extraction
ECML'05 Proceedings of the 16th European conference on Machine Learning
Deterministic automata on unranked trees
FCT'05 Proceedings of the 15th international conference on Fundamentals of Computation Theory
Schema-Guided Induction of Monadic Queries
ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Identification in the Limit of k,l-Substitutable Context-Free Languages
ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Learning Balls of Strings from Edit Corrections
The Journal of Machine Learning Research
Automatic wrapper induction from hidden-web sources with domain knowledge
Proceedings of the 10th ACM workshop on Web information and data management
Sub Node Extraction with Tree Based Wrappers
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
A learning algorithm for top-down XML transformations
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Zulu: an interactive learning competition
FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing
Theoretical Computer Science
Learning n-ary node selecting tree transducers from completely annotated examples
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Learning multiplicity tree automata
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
The HiLeX system for semantic information extraction
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Learning twig and path queries
Proceedings of the 15th International Conference on Database Theory
DLT'12 Proceedings of the 16th international conference on Developments in Language Theory
Learning queries for relational, semi-structured, and graph databases
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Query induction with schema-guided pruning strategies
The Journal of Machine Learning Research
Hi-index | 0.00 |
We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction.We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished.In practice, users prefer to provide partial annotations. We propose to account for partial annotations by intelligent tree pruning heuristics. We introduce pruning NSTTs--a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987).We have implemented our interactive learning algorithm integrated it into a visually interactive Web information extraction system--called SQUIRREL--by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction.