The eShopmonitor: a comprehensive data extraction tool for monitoring web sites
IBM Journal of Research and Development
Proceedings of the 15th international conference on World Wide Web
MyPortal: robust extraction and aggregation of web content
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Web Semantics: Science, Services and Agents on the World Wide Web
Providing resilient XPaths for external adaptation engines
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Exploratory Analysis of Collaborative Web Accessibility Improvement
ACM Transactions on Accessible Computing (TACCESS)
Visual oXPath: robust wrapping by example
Proceedings of the 21st international conference companion on World Wide Web
Hi-index | 0.00 |
In spite of the increasing prevalence of XPath languageincluding its use with XSLT, little attention has been paidto empirical study of robust pointing by XPath expressions.The goal of this study is to draw practical implications ofthe robustness of XPath expressions, taking account of thefour kinds of real-life HTML pages. For each DOM nodein the sample pages, we created three types of XPath expressions,and investigated to what extent those expressionswere able to continue pointing at the same node in the modifiedpages during the observation period of four months.The types of XPath expressions we used for the investigationinclude not only absolute expressions that simply followthe hierarchy of a document tree from the root to the targetelement, but also relative expressions that point to a targetelement in relation to some stable anchor position. As anchornodes for the relative addressing expressions, we usedelements with an href attribute, because a URL assigned toan href value should not be change too often, and so an hrefreference should be relatively stable as a semantic descriptor.In this paper, we briefly introduce an XPath authoringsupport for an annotation editor, and explain the types ofXPath expressions. An empirical evaluation of the XPath expressionsis then presented. Finally, we discuss the advantagesand limitations of the XPath expressions, taking accountof the actual page modifications, and investigate possibilitiesfor further improvement of the addressing method.