Scheduling to minimize staleness and stretch in real-time data warehouses

Authors:
Mohammad Hossein Bateni;Lukasz Golab;Mohammad Taghi Hajiaghayi;Howard Karloff
Affiliations:
Princeton University, Princeton, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA
Venue:
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Year:
2009

Citing 13
Cited 8

An overview of real-time database systems

Advances in real-time systems
Applying update streams in a soft real-time database system

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Shrinking the warehouse update Window

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Flow and stretch metrics for scheduling continuous job streams

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Synchronizing a database to improve freshness

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Update Propagation Strategies for Improving the Quality of Data on the Web

Proceedings of the 27th International Conference on Very Large Data Bases
Chain: operator scheduling for memory minimization in data stream systems

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Algorithms for flow time scheduling

Algorithms for flow time scheduling
Scheduling for shared window joins over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Operator scheduling in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Algorithms and metrics for processing multiple heterogeneous continuous queries

ACM Transactions on Database Systems (TODS)
Scheduling Updates in a Real-Time Stream Warehouse

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Stream warehousing with DataDepot

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Maintaining internal consistency of report for real-time OLAP with layer-based view

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Optimization of operator partitions in stream data warehouse

Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
UpStream: storage-centric load management for streaming applications with update semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing the stretch of independent tasks on a cluster: From sequential tasks to moldable tasks

Journal of Parallel and Distributed Computing
Adaptive co-scheduling for periodic application and update transactions in real-time database systems

Journal of Systems and Software
Scheduling with freshness and performance guarantees for web applications in the cloud

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
A generic front-stage for semi-stream processing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An effective fixed priority co-scheduling algorithm for periodic update and application transactions

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. Data for each table are generated at a constant rate, different tables possibly at different rates. For each data feed, the arrival of new data triggers an update that seeks to append the new data to the corresponding table; if multiple updates are pending for the same table, they are batched together before being loaded. At time τ, if a table has been updated with information up to time r≤τ, its staleness is defined as τ--r. Our first objective is to schedule the updates on one or more processors in a way that minimizes the total staleness. In order to ensure fairness, our second objective is to limit the maximum "stretch", which we define (roughly) as the ratio between the duration of time an update waits till it is finished being processed, and the length of the update. In contrast to earlier work proving the nonexistence of constant-competitive algorithms for related scheduling problems, we prove that any online nonpreemptive algorithm, no processor of which is ever voluntarily idle, incurs a staleness at most a constant factor larger than an obvious lower bound on total staleness (provided that the processors are sufficiently fast). We give a constant-stretch algorithm, provided that the processors are sufficiently fast, for the quasiperiodic model, in which tables can be clustered into a few groups such that the update frequencies within each group vary by at most a constant factor. Finally, we show that our constant-stretch algorithm is also constant-competitive (subject to the same proviso on processor speed) in the quasiperiodic model with respect to total weighted staleness, where tables are assigned weights that reflect their priorities.