A brief survey of web data extraction tools
ACM SIGMOD Record
Robust Pointing by XPath Language: Authoring Support and Empirical Evaluation
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Knowledge and Information Systems
Hi-index | 0.00 |
Good examples are hard to find, particularly in wrapper induction: Picking even one wrong example can spell disaster by yielding overgeneralized or overspecialized wrappers. Such wrappers extract data with low precision or recall, unless adjusted by human experts at significant cost. Visual OXPath is an open-source, visual wrapper induction system that requires minimal examples and eases wrapper refinement: Often it derives the intended wrapper from a single example through sophisticated heuristics that determine the best set of similar examples. To ease wrapper refinement, it offers a list of wrappers ranked by example similarity and robustness. Visual OXPath offers extensive visual feedback for this refinement which can be performed without any knowledge of the underlying wrapper language. Where further refinement by a human wrapper is needed, Visual OXPath profits from being based on OXPath, a declarative wrapper language that extends XPath with a thin layer of features necessary for extraction and page navigation.