Meeting service level objectives of Pig programs

Authors:
Zhuoyao Zhang;Ludmila Cherkasova;Abhishek Verma;Boon Thau Loo
Affiliations:
University of Pennsylvania;Hewlett-Packard Labs;University of Illinois at Urbana-Champaign;University of Pennsylvania
Venue:
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Year:
2012

Citing 10
Cited 1

Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
ParaTimer: a progress indicator for MapReduce DAGs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ARIA: automatic resource inference and allocation for mapreduce environments

Proceedings of the 8th ACM international conference on Autonomic computing
FLEX: a slot allocation scheduling optimizer for MapReduce workloads

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
CoScan: cooperative scan sharing in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds

CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing

Cumulon: optimizing statistical data analysis in the cloud

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cloud computing offers a compelling platform to access a large amount of computing and storage resources on demand. As the technology matures, service providers have started shifting their focus to support additional user requirements such as QoS guarantees and tailored resource provisioning for achieving service performance goals. An increasing number of MapReduce applications associated with live business intelligence require completion time guarantees. We aim to solve the resource provisioning problem: given a Pig program with a completion time goal, estimate the amount of resources (a number of map and reduce slots) required for completing the program with a given (soft) deadline. We develop a simple yet elegant performance model that provides completion time estimates of a Pig program as a function of allocated resources. Then this model is used as a basis for solving the inverse resource provisioning problem for Pig programs. We evaluate our approach using a 66-node Hadoop cluster and a popular PigMix benchmark. The designed performance model accurately estimates the required amount of resources for Pig programs with completion time goals: the completion times of the Pig programs with allocated resources are within 10% of the targeted deadlines.