Query induction with schema-guided pruning strategies

Authors:
Joachim Niehren;Jérôme Champavère;Aurélien Lemay;Rémi Gilleron
Affiliations:
INRIA Lille & LIFL, Parc Scientifique de la Haute Borne, Villeneuve d'Ascq, France;INRIA Lille & LIFL, Parc Scientifique de la Haute Borne, Villeneuve d'Ascq, France;INRIA Lille & LIFL, Parc Scientifique de la Haute Borne, Villeneuve d'Ascq, France;INRIA Lille & LIFL, Parc Scientifique de la Haute Borne, Villeneuve d'Ascq, France
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 26
Cited 0

Learning regular sets from queries and counterexamples

Information and Computation
LTUR: a simplified linear-time unit resolution algorithm for Horn formulae and computer implementation

Information Processing Letters
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Datalog LITE: a deductive query language with linear time model checking

ACM Transactions on Computational Logic (TOCL)
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
Using domain information during the learning of a subsequential transducer

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Monadic Queries over Tree-Structured Data

LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
2D Conditional Random Fields for Web information extraction

ICML '05 Proceedings of the 22nd international conference on Machine learning
Inference of concise DTDs from XML data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Interactive learning of node selecting tree transducer

Machine Learning
Interactive Tuples Extraction from Semi-Structured Data

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Learning (k,l)-contextual tree languages for information extraction from web pages

Machine Learning
Schema-Guided Induction of Monadic Queries

ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Efficient inclusion checking for deterministic tree automata and XML Schemas

Information and Computation
Active learning with strong and weak views: a case study on wrapper induction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A learning algorithm for top-down XML transformations

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Taking the OXPath down the deep web

Proceedings of the 14th International Conference on Extending Database Technology
OXPath: little language, little memory, great value

Proceedings of the 20th international conference companion on World wide web
XPathMark: an XPath benchmark for the XMark generated data

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
The lixto project: exploring new frontiers of web data extraction

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Learning n-ary node selecting tree transducers from completely annotated examples

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Query-based learning of XPath expressions

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Learning twig and path queries

Proceedings of the 15th International Conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schemaguided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction.