A brief survey of web data extraction tools
ACM SIGMOD Record
Proceedings of the 27th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Commercial Web Sources
Proceedings of the IFIP TC8 / WG8.1 Working Conference on Engineering Information Systems in the Internet Context
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Wrapper induction for information extraction
Wrapper induction for information extraction
Automatic wrapper maintenance for semi-structured web sources using results from previous queries
Proceedings of the 2005 ACM symposium on Applied computing
Automatically Generating Labeled Examples for Web Wrapper Maintenance
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Adding Physical Optimization to Cost Models in Information Mediators
ICEBE '05 Proceedings of the IEEE International Conference on e-Business Engineering
The denodo data integration platform
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
As Web applications grow in terms of quantity and quality, different vertical solutions could make use of them as an important source of information. Nevertheless, obtaining information from web sources becomes a challenging issue because of their complex access due to the hypertext browsing paradigm, and HTML's semistructured format. Web Automation middleware navigates through web links and fills web forms in an automatic way, so to extract information from the Hidden Web. The main optimization parameter is the time required to navigate through the intermediate pages that lead to the desired results. This work proposes a technique which focuses on improving the browsing time by storing information from previous queries, and using it to preload an adequate subset of the navigational sequence on a specific browser, before the next sequence is launched. It also takes into account the most commonly used sequences, being the ones to be preloaded more often.