Data caching issues in an information retrieval system
ACM Transactions on Database Systems (TODS)
Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Applying update streams in a soft real-time database system
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Updates and view maintenance in soft real-time database systems
Proceedings of the eighth international conference on Information and knowledge management
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Best-effort cache synchronization with source cooperation
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database Support for Efficiently Maintaining Derived Data
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Load Shedding for Aggregation Queries over Data Streams
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Exploring the tradeoff between performance and data freshness in database-driven Web servers
The VLDB Journal — The International Journal on Very Large Data Bases
Data Triage: An Adaptive Architecture for Load Shedding in TelegraphCQ
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Dynamic Load Distribution in the Borealis Stream Processor
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
UNIT: User-centric Transaction Management in Web-Database Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Providing resiliency to load variations in distributed stream processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Load shedding in stream databases: a control-based approach
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Window-aware load shedding for aggregation queries over data streams
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Contract-based load management in federated distributed systems
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Load shedding in a data stream manager
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Operator scheduling in a data stream manager
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Staying FIT: efficient load shedding techniques for distributed stream processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Replay-based approaches to revision processing in stream query engines
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Flexible and scalable storage management for data-intensive stream processing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Scheduling Updates in a Real-Time Stream Warehouse
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Stream warehousing with DataDepot
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scheduling to minimize staleness and stretch in real-time data warehouses
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Real-time route planning with stream processing systems: a case study for the city of Lucerne
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming
Mining frequent itemsets over tuple-evolving data streams
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.01 |
This paper addresses the problem of minimizing the staleness of query results for streaming applications with update semantics under overload conditions. Staleness is a measure of how out-of-date the results are compared with the latest data arriving on the input. Real-time streaming applications are subject to overload due to unpredictably increasing data rates, while in many of them, we observe that data streams and queries in fact exhibit "update semantics" (i.e., the latest input data are all that really matters when producing a query result). Under such semantics, overload will cause staleness to build up. The key to avoid this is to exploit the update semantics of applications as early as possible in the processing pipeline. In this paper, we propose UpStream, a storage-centric framework for load management over streaming applications with update semantics. We first describe how we model streams and queries that possess the update semantics, providing definitions for correctness and staleness for the query results. Then, we show how staleness can be minimized based on intelligent update key scheduling techniques applied at the queue level, while preserving the correctness of the results, even for complex queries that involve sliding windows. UpStream is based on the simple idea of applying the updates in place, yet with great returns in terms of lowering staleness and memory consumption, as we also experimentally verify on the Borealis system.