Computational aspects of resilient data extraction from semistructured sources (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automating Web navigation with the WebVCR
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Mercator: A scalable, extensible Web crawler
World Wide Web
Semi-Automatic Wrapper Generation for Commercial Web Sources
Proceedings of the IFIP TC8 / WG8.1 Working Conference on Engineering Information Systems in the Internet Context
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
MyPortal: robust extraction and aggregation of web content
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Using HMM to learn user browsing patterns for focused web crawling
Data & Knowledge Engineering - Special issue: WIDM 2004
Koala: capture, share, automate, personalize business processes on the web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Supporting end-users in the creation of dependable web clips
Proceedings of the 16th international conference on World Wide Web
Smart bookmarks: automatic retroactive macro recording on the web
Proceedings of the 20th annual ACM symposium on User interface software and technology
Automatically maintaining navigation sequences for querying semi-structured web sources
Data & Knowledge Engineering
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scalable web data extraction for online market intelligence
Proceedings of the VLDB Endowment
Efficient execution of web navigation sequences
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
Web automation applications are widely used for different purposes such as B2B integration, automated testing of web applications or technology and business watch. One crucial part in web automation applications is for them to easily generate and reproduce navigation sequences. This problem is specially complicated in the case of the new breed of AJAX-based websites. Although recently some tools have also addressed the problem, they show some limitations either in usability or their ability to deal with complex websites. In this paper, we propose a set of new techniques to build an automatic web navigation system able to deal with these complexities. Our main contributions are: a new method for recording navigation sequences able to scale to a wider range of events, an algorithm to identify in a change-resilient manner the target element of a user action, and a novel method to detect when the effects caused by a user action (including the effects of scripting code and AJAX requests) have finished. In addition, we have also tested our approach with a high number of real web sources and have compared it with other relevant web automation tools obtaining very good results.