Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto
LPNMR '01 Proceedings of the 6th International Conference on Logic Programming and Nonmonotonic Reasoning
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
Context Generalization for Information Extraction from the Web
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Information extraction from web documents based on local unranked tree automaton inference
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Towards a wrapper-driven ontology-based framework for knowledge extraction
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Characterizing structural relationships in scenes using graph kernels
ACM SIGGRAPH 2011 papers
Hi-index | 0.00 |
Effienct and reliable integration of web data requires building programs called wrappers Hand writting wrappers is tedious and error prone Constant changes in the web, also implies that wrappers need to be constantly refactored Machine learning has proven to be useful, but current techniques are either limited in expressivity, require non-intuitive user interaction or do not allow for n-ary extraction We study using tree-patterns as an n-ary extraction language and propose an algorithm learning such queries It calculates the most information-conservative tree-pattern which is a generalization of two input trees A notable aspect is that the approach allows to learn queries containing both child and descendant relationships between nodes More importantly, the proposed approach does not require any labeling other than the data which the user effectively wants to extract The experiments reported show the effectiveness of the approach.