Run-time operator state spilling for memory intensive long-running queries

Authors:
Bin Liu;Yali Zhu;Elke Rundensteiner
Affiliations:
Worcester Polytechnic Institut;Worcester Polytechnic Institut;WPI
Venue:
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Year:
2006

Citing 18
Cited 16

Encapsulation of parallelism in the Volcano query processing system

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Shrinking the warehouse update Window

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
How to roll a join: asynchronous incremental view maintenance

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Chain: operator scheduling for memory minimization in data stream systems

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Revisiting pipelined parallelism in multi-join query processing

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A dynamically adaptive distributed system for processing complex continuous queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

State-slice: new paradigm of multi-query optimization of window-based stream queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Query suspend and resume

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Data-driven memory management for stream join

Information Systems
A new look at generating multi-join continuous query plans: A qualified plan generation problem

Data & Knowledge Engineering
Clustersheddy: load shedding using moving clusters over spatio-temporal data streams

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Processing exact results for sliding window joins over data streams using disk storage

International Journal of Intelligent Information and Database Systems
A disk-based, adaptive approach to memory-limited computation of windowed stream joins

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Achieving high freshness and optimal throughput in CPU-limited execution of multi-join continuous queries

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Scalable splitting of massive data streams

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
High-performance complex event processing using continuous sliding views

Proceedings of the 16th International Conference on Extending Database Technology
Integrating scale out and fault tolerance in stream processing using operator state management

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Overcoming memory limitations in high-throughput event-based applications

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Input data organization for batch processing in time window based computations

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Driver input selection for main-memory multi-way joins

Proceedings of the 28th Annual ACM Symposium on Applied Computing
A catalog of stream processing optimizations

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Main memory is a critical resource when processing long-running queries over data streams with state intensive operators. In this work, we investigate state spill strategies that handle run-time memory shortage when processing such complex queries by selectively pushing operator states into disks. Unlike previous solutions which all focus on one single operator only, we instead target queries with multiple state intensive operators. We observe an interdependency among multiple operators in the query plan when spilling operator states. We illustrate that existing strategies, which do not take account of this interdependency, become largely ineffective in this query context. Clearly, a consolidated plan level spill strategy must be devised to address this problem. Several data spill strategies are proposed in this paper to maximize the run-time query throughput in memory constrained environments. The bottom-up state spill strategy is an operator-level strategy that treats all data in one operator state equally. More sophisticated partition-level data spill strategies are then proposed to take different characteristics of the input data into account, including the local output, the global output and the global output with penalty strategies. All proposed state spill strategies have been implemented in the D-CAPE continuous query system. The experimental results confirm the effectiveness of our proposed strategies. In particular, the global output strategy and the global output with penalty strategy have shown favorable results as compared to the other two more localized strategies.