Just-in-time data distribution for analytical query processing

Authors:
Milena Ivanova;Martin Kersten;Fabian Groffen
Affiliations:
Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands;Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands;Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands
Venue:
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Year:
2012

Citing 17
Cited 1

Performance tradeoffs for client-server query processing

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cache investment: integrating query optimization and distributed data placement

ACM Transactions on Database Systems (TODS)
Cache-Aware Query Routing in a Cluster of Databases

Proceedings of the 17th International Conference on Data Engineering
Parallel querying with non-dedicated computers

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Extending DBMSs with satellite databases

The VLDB Journal — The International Journal on Very Large Data Bases
Automatic optimization of parallel dataflow programs

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
The Data Cyclotron query processing scheme

Proceedings of the 13th International Conference on Extending Database Technology
FlumeJava: easy, efficient data-parallel pipelines

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
An architecture for recycling intermediates in a column-store

ACM Transactions on Database Systems (TODS)
The performance of MapReduce: an in-depth study

Proceedings of the VLDB Endowment
Column-oriented storage techniques for MapReduce

Proceedings of the VLDB Endowment
Zephyr: live migration in shared nothing databases for elastic cloud platforms

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Hardware-oblivious parallelism for in-memory column-stores

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.