Executing incoherency bounded continuous queries at web data aggregators

Authors:
Rajeev Gupta;Ashish Puri;Krithi Ramamritham
Affiliations:
IBM India Research Laboratory, New Delhi, India;Indian Institute of Technology, Bombay, Mumbai, India;Indian Institute of Technology, Bombay, Mumbai, India
Venue:
WWW '05 Proceedings of the 14th international conference on World Wide Web
Year:
2005

Citing 19
Cited 4

Introduction to algorithms

Introduction to algorithms
Data caching issues in an information retrieval system

ACM Transactions on Database Systems (TODS)
Continuous queries over append-only databases

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Web cache coherence

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Improving the WWW: caching or multicast

Computer Networks and ISDN Systems - Selected papers of the 3rd international caching workshop
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive push-pull: disseminating dynamic web data

Proceedings of the 10th international conference on World Wide Web
Adaptive precision setting for cached approximate values

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Volume Leases for Consistency in Large-Scale Systems

IEEE Transactions on Knowledge and Data Engineering
Continual Queries for Internet Scale Event-Driven Information Delivery

IEEE Transactions on Knowledge and Data Engineering
Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The case for geographical push-caching

HOTOS '95 Proceedings of the Fifth Workshop on Hot Topics in Operating Systems (HotOS-V)
Maintaining Temporal Coherency of Virtual Data Warehouses

RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Adaptive filters for continuous queries over distributed data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Relaxed currency and consistency: how to say "good enough" in SQL

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Design and evaluation of a continuous consistency model for replicated services

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Improving web server performance by caching dynamic data

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Continuous multicast push of Web documents over the Internet

IEEE Network: The Magazine of Global Internetworking

Client assignment in content dissemination networks for dynamic data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimized query planning of continuous aggregation queries in dynamic data dissemination networks

Proceedings of the 16th international conference on World Wide Web
Data-based research at IIT Bombay

ACM SIGMOD Record
Ratio threshold queries over distributed data sources

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some function over distributed data items, for example, to determine when and whether (a) the traffic entering a highway from multiple feed roads will result in congestion in a thoroughfare or (b) the value of a stock portfolio exceeds a threshold. Using the standard Web infrastructure for these applications will increase the reach of the underlying information. But, since these queries involve data from multiple sources, with sources supporting standard HTTP (pull-based) interfaces, special query processing techniques are needed. Also, these applications often have the flexibility to tolerate some incoherency, i.e., some differences between the results reported to the user and that produced from the virtual database made up of the distributed data sources.In this paper, we develop and evaluate client-pull-based techniques for refreshing data so that the results of the queries over distributed data can be correctly reported, conforming to the limited incoherency acceptable to the users.We model as well as estimate the dynamics of the data items using a probabilistic approach based on Markov Chains. Depending on the dynamics of data we adapt the data refresh times to deliver query results with the desired coherency. The commonality of data needs of multiple queries is exploited to further reduce refresh overheads. Effectiveness of our approach is demonstrated using live sources of dynamic data: the number of refreshes it requires is (a) an order of magnitude less than what we would need if every potential update is pulled from the sources, and (b) comparable to the number of messages needed by an ideal algorithm, one that knows how to optimally refresh the data from distributed data sources. Our evaluations also bring out a very practical and attractive tradeoff property of pull based approaches, e.g., a small increase in tolerable incoherency leads to a large decrease in message overheads.