ACM Transactions on Database Systems (TODS)
Red brick warehouse: a read-mostly RDBMS for open SMP platforms
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An optimal evaluation of Boolean expressions in an online query system
Communications of the ACM
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive ordering of pipelined stream filters
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
QPipe: a simultaneously pipelined relational query engine
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive aggregation on chip multiprocessors
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cooperative scans: dynamic bandwidth sharing in a DBMS
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The end of an architectural era: (it's time for a complete rewrite)
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal algorithms for shared filter evaluation in data stream systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scalable regular expression matching on data streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A generic flow algorithm for shared filter ordering problems
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Main-memory scan sharing for multi-core CPUs
Proceedings of the VLDB Endowment
Constant-Time Query Processing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
An architecture for recycling intermediates in a column-store
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Predictable performance for unpredictable workloads
Proceedings of the VLDB Endowment
The Data Cyclotron query processing scheme
Proceedings of the 13th International Conference on Extending Database Technology
The DataPath system: a data-centric analytic processing engine for large data warehouses
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The pipelined set cover problem
ICDT'05 Proceedings of the 10th international conference on Database Theory
SharedDB: killing one thousand queries with one stone
Proceedings of the VLDB Endowment
On the optimization of schedules for MapReduce workloads in the presence of shared scans
The VLDB Journal — The International Journal on Very Large Data Bases
Sharing data and work across concurrent analytical queries
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention and thrashing, because the physical plans--unaware of each other--compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly and can be highly erratic when multiple complex queries run at the same time. We present in this paper Cjoin, a new design that substantially improves throughput in large-scale data analytics systems processing many concurrent join queries. In contrast to the conventional query-at-a-time model our approach employs a single physical plan that shares I/O, computation, and tuple storage across all in-flight join queries. We use an "always on" pipeline of non-blocking operators, managed by a controller that continuously examines the current query mix and optimizes the pipeline on the fly. Our design enables data analytics engines to scale gracefully to large data sets, provide predictable execution times, and reduce contention. We implemented Cjoin as an extension to the PostgreSQL DBMS. This prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.