Resource allocation problems: algorithmic approaches
Resource allocation problems: algorithmic approaches
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The content and access dynamics of a busy Web site: findings and implications
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
WebCQ-detecting and delivering information changes on the web
Proceedings of the ninth international conference on Information and knowledge management
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Nonserial Dynamic Programming
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
Maintaining time-decaying stream aggregates
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Monitoring the dynamic web to respond to continuous queries
WWW '03 Proceedings of the 12th international conference on World Wide Web
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Asymmetric Batch Incremental View Maintenance
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Adaptive pull-based policies for wide area data delivery
ACM Transactions on Database Systems (TODS)
Maintaining dynamic channel profiles on the web
Proceedings of the VLDB Endowment
Retrievability: an evaluation measure for higher order information access tasks
Proceedings of the 17th ACM conference on Information and knowledge management
On designing a market monitoring web agent system
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Using Knowledge Base for Event-Driven Scheduling of Web Monitoring Systems
EC-Web 2009 Proceedings of the 10th International Conference on E-Commerce and Web Technologies
Greedy algorithms for sequential sensing decisions
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Foundations and Trends in Information Retrieval
On trade-offs in event delivery systems
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
New models and algorithms for throughput maximization in broadcast scheduling
WAOA'10 Proceedings of the 8th international conference on Approximation and online algorithms
Profile-Based online data delivery
OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II
Processing flows of information: From data stream to complex event processing
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
The Web is becoming a universal information dissemination medium, due to a number of factors including its support for content dynamicity. A growing number of Web information providers post near real-time updates in domains such as auctions, stock markets, bulletin boards, news, weather, roadway conditions, sports scores, etc. External parties often wish to capture this information for a wide variety of purposes ranging from online data mining to automated synthesis of information from multiple sources. There has been a great deal of work on the design of systems that can process streams of data from Web sources, but little attention has been paid to how to produce these data streams, given that Web pages generally require "pull-based" access. In this paper we introduce a new general-purpose algorithm for monitoring Web information sources, effectively converting pull-based sources into push-based ones. Our algorithm can be used in conjunction with continuous query systems that assume information is fed into the query engine in a push-based fashion. Ideally, a Web monitoring algorithm for this purpose should achieve two objectives: (1) timeliness and (2) completeness of information captured. However, we demonstrate both analytically and empirically using real-world data that these objectives are fundamentally at odds. When resources available for Web monitoring are limited, and the number of sources to monitor is large, it may be necessary to sacrifice some timeliness to achieve better completeness, or vice versa. To take this fact into account, our algorithm is highly parameterized and targets an application-specified balance between timeliness and completeness. In this paper we formalize the problem of optimizing for a flexible combination of timeliness and completeness, and prove that our parameterized algorithm is a 2- approximation in all cases, and in certain cases is optimal.