A brief survey of web data extraction tools
ACM SIGMOD Record
Robust Pointing by XPath Language: Authoring Support and Empirical Evaluation
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Towards more personalized web: extraction and integration of dynamic content from the web
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Extending Services Delivery with Lightweight Composition
WISE '08 Proceedings of the 2008 international workshops on Web Information Systems Engineering
Crosslanguage blog mining and trend visualisation
Proceedings of the 18th international conference on World wide web
Blog credibility ranking by exploiting verified content
Proceedings of the 3rd workshop on Information credibility on the web
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The OXPath to success in the deep web
Proceedings of the 20th international conference companion on World wide web
Hi-index | 0.00 |
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative XPath expressions, although not widely used, should be used in preference to absolute XPath expressions in extracting content from human-created Web documents. Evaluation of robustness covers four thousand queries executed on several hundred webpages. We show that in referencing parts of real world dynamic HTML documents, relative XPath expressions are on average significantly more robust than absolute XPath ones.