Just-in-time data distribution for analytical query processing

  • Authors:
  • Milena Ivanova;Martin Kersten;Fabian Groffen

  • Affiliations:
  • Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands;Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands;Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands

  • Venue:
  • ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.