Orchestrating the deployment of computations in the cloud with conductor

Authors:
Alexander Wieder;Pramod Bhatotia;Ansley Post;Rodrigo Rodrigues
Affiliations:
Max Planck Institute for Software Systems;Max Planck Institute for Software Systems;Google Inc. and Max Planck Institute for Software Systems;CITI, Universidade Nova de Lisboa and Max Planck Institute for Software Systems
Venue:
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Year:
2012

Citing 20
Cited 8

Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Matchmaking: Distributed Resource Management for High Throughput Computing

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Ivy: a read/write peer-to-peer file system

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Don't settle for less than the best: use optimization to make decisions

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Grid resource allocation: allocation mechanisms and utilisation patterns

AusGrid '08 Proceedings of the sixth Australasian workshop on Grid computing and e-research - Volume 82
Cutting the electric bill for internet-scale systems

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Rhizoma: a runtime for self-deploying, self-managing overlays

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Making cloud intermediate data fault-tolerant

Proceedings of the 1st ACM symposium on Cloud computing
Brief announcement: modelling MapReduce for optimal execution in the cloud

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Cloudward bound: planning for beneficial migration of enterprise applications to the cloud

Proceedings of the ACM SIGCOMM 2010 conference
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Conductor: orchestrating the clouds

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
See spot run: using spot instances for mapreduce workflows

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Predicting and optimizing system utilization and performance via statistical machine learning

Predicting and optimizing system utilization and performance via statistical machine learning
Mesos: a platform for fine-grained resource sharing in the data center

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Cutting MapReduce cost with spot market

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing

Bridging the tenant-provider gap in cloud services

Proceedings of the Third ACM Symposium on Cloud Computing
ClouDiA: a deployment advisor for public clouds

Proceedings of the VLDB Endowment
Decision support for partially moving applications to the cloud: the example of business intelligence

Proceedings of the 2013 international workshop on Hot topics in cloud services
Building and scaling virtual clusters with residual resources from interactive clouds

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
SPANStore: cost-effective geo-replicated storage spanning multiple cloud services

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters

Proceedings of the 4th annual Symposium on Cloud Computing
An untold story of redundant clouds: making your service deployment truly reliable

Proceedings of the 9th Workshop on Hot Topics in Dependable Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

When organizationsmove computation to the cloud, they must choose from a myriad of cloud services that can be used to outsource these jobs. The impact of this choice on price and performance is unclear, even for technical users. To further complicate this choice, factors like price fluctuations due to spot markets, or the cost of recovering from faults must also be factored in. In this paper, we present Conductor, a system that frees cloud customers from the burden of deciding which services to use when deploying MapReduce computations in the cloud. With Conductor, customers only specify goals, e.g., minimizing monetary cost or completion time, and the system automatically selects the best cloud services to use, deploys the computation according to that selection, and adapts to changing conditions at deployment time. The design of Conductor includes several novel features, such as a system to manage the deployment of cloud computations across different services, and a resource abstraction layer that provides a unified interface to these services, therefore hiding their low-level differences and simplifying the planning and deployment of the computation. We implemented Conductor and integrated it with the Hadoop framework. Our evaluation using AmazonWeb Services shows that Conductor can find very subtle opportunities for cost savings while meeting deadline requirements, and that Conductor incurs a modest overhead due to planning computations and the resource abstraction layer.