Sharing data and work across concurrent analytical queries

Authors:
Iraklis Psaroudakis;Manos Athanassoulis;Anastasia Ailamaki
Affiliations:
École Polytechnique Fédérale de Lausanne;École Polytechnique Fédérale de Lausanne;École Polytechnique Fédérale de Lausanne
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 24
Cited 0

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
View indexing in relational databases

ACM Transactions on Database Systems (TODS)
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Pipelining in multi-query optimization

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Batch Scheduling in Parallel Database Systems

Proceedings of the Ninth International Conference on Data Engineering
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Redbrick Vista: Aggregate Computation and Management

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Dynamic Caching of Query Results for Decision Support Systems

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
QPipe: a simultaneously pipelined relational query engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
An evaluation of buffer management strategies for relational database systems

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
To share or not to share?

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cooperative scans: dynamic bandwidth sharing in a DBMS

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Main-memory scan sharing for multi-core CPUs

Proceedings of the VLDB Endowment
Shore-MT: a scalable storage manager for the multicore era

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A scalable, predictable join operator for highly concurrent data warehouses

Proceedings of the VLDB Endowment
Predictable performance for unpredictable workloads

Proceedings of the VLDB Endowment
The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Predictable performance and high query concurrency for data analytics

The VLDB Journal — The International Journal on Very Large Data Bases
SharedDB: killing one thousand queries with one stone

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the query-centric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators. First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results.