Information delivery systems: an exploration of Web pull and push technologies
Communications of the AIS
ACM Transactions on Internet Technology (TOIT)
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Best-effort cache synchronization with source cooperation
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A brief survey of web data extraction tools
ACM SIGMOD Record
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Monitoring the dynamic web to respond to continuous queries
WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic information extraction from large websites
Journal of the ACM (JACM)
Hi-index | 0.00 |
Web sites, Web pages and the data on pages are available only for specific periods of time and are deleted afterwards from a client’s point of view. An important task in order to retrieve information from the Web is to consider Web information in the course of time. Different strategies like push and pull strategies may be applied for this task. Since push services are usually not available, pull strategies have to be conceived in order to optimize the retrieved information with respect to the age of retrieved data and its completeness. In this article we present a new procedure to optimize retrieved data from Web pages by page decomposition. By deploying an automatic Wrapper induction technique a page is decomposed into functional segments. Each segment is considered as an independent component for the analysis of the time behavior of the page. Based on this decomposition we present a new component-based download strategy. By applying this method to Web pages it is shown that for a fraction of Web data the freshness of retrieved data may be improved significantly compared to traditional methods.