Introduction to algorithms
Data caching issues in an information retrieval system
ACM Transactions on Database Systems (TODS)
Continuous queries over append-only databases
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Improving the WWW: caching or multicast
Computer Networks and ISDN Systems - Selected papers of the 3rd international caching workshop
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive push-pull: disseminating dynamic web data
Proceedings of the 10th international conference on World Wide Web
Adaptive precision setting for cached approximate values
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Volume Leases for Consistency in Large-Scale Systems
IEEE Transactions on Knowledge and Data Engineering
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The case for geographical push-caching
HOTOS '95 Proceedings of the Fifth Workshop on Hot Topics in Operating Systems (HotOS-V)
Maintaining Temporal Coherency of Virtual Data Warehouses
RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Adaptive filters for continuous queries over distributed data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Relaxed currency and consistency: how to say "good enough" in SQL
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Design and evaluation of a continuous consistency model for replicated services
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Improving web server performance by caching dynamic data
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Cost-aware WWW proxy caching algorithms
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Continuous multicast push of Web documents over the Internet
IEEE Network: The Magazine of Global Internetworking
Client assignment in content dissemination networks for dynamic data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimized query planning of continuous aggregation queries in dynamic data dissemination networks
Proceedings of the 16th international conference on World Wide Web
Data-based research at IIT Bombay
ACM SIGMOD Record
Ratio threshold queries over distributed data sources
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some function over distributed data items, for example, to determine when and whether (a) the traffic entering a highway from multiple feed roads will result in congestion in a thoroughfare or (b) the value of a stock portfolio exceeds a threshold. Using the standard Web infrastructure for these applications will increase the reach of the underlying information. But, since these queries involve data from multiple sources, with sources supporting standard HTTP (pull-based) interfaces, special query processing techniques are needed. Also, these applications often have the flexibility to tolerate some incoherency, i.e., some differences between the results reported to the user and that produced from the virtual database made up of the distributed data sources.In this paper, we develop and evaluate client-pull-based techniques for refreshing data so that the results of the queries over distributed data can be correctly reported, conforming to the limited incoherency acceptable to the users.We model as well as estimate the dynamics of the data items using a probabilistic approach based on Markov Chains. Depending on the dynamics of data we adapt the data refresh times to deliver query results with the desired coherency. The commonality of data needs of multiple queries is exploited to further reduce refresh overheads. Effectiveness of our approach is demonstrated using live sources of dynamic data: the number of refreshes it requires is (a) an order of magnitude less than what we would need if every potential update is pulled from the sources, and (b) comparable to the number of messages needed by an ideal algorithm, one that knows how to optimally refresh the data from distributed data sources. Our evaluations also bring out a very practical and attractive tradeoff property of pull based approaches, e.g., a small increase in tolerable incoherency leads to a large decrease in message overheads.