Automating Web navigation with the WebVCR
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A brief survey of web data extraction tools
ACM SIGMOD Record
Web macros by example: users managing the WWW of applications
CHI '99 Extended Abstracts on Human Factors in Computing Systems
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes
DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
A Rule-Based Query Language for HTML
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Monadic datalog and the expressive power of languages for Web information extraction
Journal of the ACM (JACM)
Automation and customization of rendered web pages
Proceedings of the 18th annual ACM symposium on User interface software and technology
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Proceedings of the 15th international conference on World Wide Web
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Transcendence: enabling a personal view of the deep web
Proceedings of the 13th international conference on Intelligent user interfaces
Accessing the deep web: when good ideas go bad
Companion to the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Automating Navigation Sequences in AJAX Websites
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Hi-index | 0.00 |
The world wide web provides access to a wealth of data. Collecting and maintaining such large amounts of data necessitates automated processing for extraction, since appropriate automation can perform extraction tasks that would be otherwise infeasible. Modern web interfaces, however, are generally designed primarily for human users, delivering sophisticated interactions through the use of client-side scripting and asynchronous server communication. To this end, we introduce OXPath, a careful extension of XPath that facilitates data extraction from the deep web. OXPath exploits XPath's familiarity and theoretical foundations. OXPath, then, achieves favourable evaluation complexity and optimal page buffering, storing only a constant number of pages for non-recursive queries. Further, OXPath provides a lightweight interface, which is easy to use and embed. This paper outlines the motivation, theoretical framework, current implementation, and preliminary results obtained so far. We conclude with proposed future work on OXPath, including an investigation of how to deploy OXPath efficiently in a highly elastic computing framework (cloud).