Proceedings of the 27th International Conference on Very Large Data Bases
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Query Selection Techniques for Efficient Crawling of Structured Web Sources
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Communications of the ACM - ACM at sixty: a look back in time
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
An Approach to Deep Web Crawling by Sampling
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Crawling Deep Web Using a New Set Covering Algorithm
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Hi-index | 0.00 |
Web database crawling is one of the major kinds of design choices solution for Deep Web data integration. To the best of our knowledge, the existing works only focused on how to crawl all records in a web database at one time. Due to the high dynamic of web databases, it is not practical to always crawl the whole database in order to harvest a small proportion of new records. To this end, this paper studies the problem of incremental web database crawling, which targets at crawling the new records from a web database as many as possible while minimizing the communication costs. In our approach, a new graph model, an incremental crawling task is transformed into a graph traversal process. Based on this graph, appropriate queries are generated for crawling by analyzing the history versions of the web database. Extensive experimental evaluations over real Web databases validate the effectiveness of our techniques and provide insights for future efforts in this direction.