DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
A Rule-Based Query Language for HTML
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
AJAXSearch: crawling, indexing and searching web 2.0 applications
Proceedings of the VLDB Endowment
How the minotaur turned into ariadne: ontologies in web data extraction
ICWE'11 Proceedings of the 11th international conference on Web engineering
Query induction with schema-guided pruning strategies
The Journal of Machine Learning Research
Hi-index | 0.00 |
Although deep web analysis has been studied extensively, there is no succinct formalism to describe user interactions with AJAX-enabled web applications. Toward this end, we introduce OXPath as a superset of XPath 1.0. Beyond XPath, OXPath is able (1) to fill web forms and trigger DOM events, (2) to access dynamically computed CSS attributes, (3) to navigate between visible form fields, and (4) to mark relevant information for extraction. This way, OXPath expressions can closely simulate the human interaction relevant for navigation rather than rely exclusively on the HTML structure. Thus, they are quite resilient against technical changes. We demonstrate the expressiveness and practical efficacy of OXPath to tackle a group flight planning problem. We use the OXPath implementation and visual interface to access the popular, highly-scripted travel site Kayak. We show, how to formulate OXPath expressions to extract all booking information with just a few lines of code.