Learning regular sets from queries and counterexamples
Information and Computation
Information Processing Letters
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Datalog LITE: a deductive query language with linear time model checking
ACM Transactions on Computational Logic (TOCL)
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Using domain information during the learning of a subsequential transducer
ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Monadic Queries over Tree-Structured Data
LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Interactive learning of node selecting tree transducer
Machine Learning
Interactive Tuples Extraction from Semi-Structured Data
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Schema-Guided Induction of Monadic Queries
ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Efficient inclusion checking for deterministic tree automata and XML Schemas
Information and Computation
Active learning with strong and weak views: a case study on wrapper induction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A learning algorithm for top-down XML transformations
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Taking the OXPath down the deep web
Proceedings of the 14th International Conference on Extending Database Technology
OXPath: little language, little memory, great value
Proceedings of the 20th international conference companion on World wide web
XPathMark: an XPath benchmark for the XMark generated data
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
The lixto project: exploring new frontiers of web data extraction
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Learning n-ary node selecting tree transducers from completely annotated examples
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Query-based learning of XPath expressions
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Learning twig and path queries
Proceedings of the 15th International Conference on Database Theory
Hi-index | 0.00 |
Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schemaguided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction.