A scalable, predictable join operator for highly concurrent data warehouses

Authors:
George Candea;Neoklis Polyzotis;Radek Vingralek
Affiliations:
EPFL & Aster Data;U. C. Santa Cruz;Aster Data
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 18
Cited 12

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Red brick warehouse: a read-mostly RDBMS for open SMP platforms

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
QPipe: a simultaneously pipelined relational query engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cooperative scans: dynamic bandwidth sharing in a DBMS

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The end of an architectural era: (it's time for a complete rewrite)

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal algorithms for shared filter evaluation in data stream systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scalable regular expression matching on data streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A generic flow algorithm for shared filter ordering problems

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Main-memory scan sharing for multi-core CPUs

Proceedings of the VLDB Endowment
Constant-Time Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Continuous sampling for online aggregation over multiple queries

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MOSS-DB: a hardware-aware OLAP database

WAIM'10 Proceedings of the 11th international conference on Web-age information management
MRShare: sharing across multiple queries in MapReduce

Proceedings of the VLDB Endowment
Predicting completion times of batch query workloads using interaction-aware models and simulation

Proceedings of the 14th International Conference on Extending Database Technology
Multi-core vs. I/O wall: the approaches to conquer and cooperate

WAIM'11 Proceedings of the 12th international conference on Web-age information management
CoScan: cooperative scan sharing in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
SharedDB: killing one thousand queries with one stone

Proceedings of the VLDB Endowment
Improving online aggregation performance for skewed data distribution

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Scaling up analytical queries with column-stores

Proceedings of the Sixth International Workshop on Testing Database Systems
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
Sharing data and work across concurrent analytical queries

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans---unaware of each other---compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly when multiple complex queries run at the same time. We describe an augmentation of traditional query engines that improves join throughput in large-scale concurrent data warehouses. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that can share I/O, computation, and tuple storage across all in-flight join queries. We use an "always-on" pipeline of non-blocking operators, coupled with a controller that continuously examines the current query mix and performs run-time optimizations. Our design allows the query engine to scale gracefully to large data sets, provide predictable execution times, and reduce contention. In our empirical evaluation, we found that our prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.