Adaptive query execution for data management in the cloud

Authors:
Adrian Daniel Popescu;Debabrata Dash;Verena Kantere;Anastasia Ailamaki
Affiliations:
Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland;Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland;Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland;Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
Venue:
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Year:
2010

Citing 8
Cited 1

The SDSS skyserver: public access to the sloan digital sky server data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
A workload-driven unit of cache replacement for mid-tier database caching

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Towards non-intrusive elastic query processing in the cloud

Proceedings of the fourth international workshop on Cloud data management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major component of many cloud services is query processing on data stored in the underlying cloud cluster. The traditional techniques for query processing on a cluster are those offered by parallel DBMS. These techniques however, cannot guarantee high performance for cloud; parallel DBMS lack adequate fault tolerance mechanisms in order to deal with non-negligible software and hardware failures. MapReduce, on the other hand, allows query processing solutions that are fault tolerant, but imposes substantial overheads. In this paper, we propose an adaptive software architecture which can effortlessly switch between MapReduce and parallel DBMS in order to efficiently process queries regardless of their response times. Switching between the two architectures is performed in a transparent manner based on an intuitive cost model, which computes the expected execution time in presence of failures. The experimental results show that the adaptive architecture achieves the lowest possible query execution time for various scenarios.