The seven deadly sins of cloud computing research

Authors:
Malte Schwarzkopf;Derek G. Murray;Steven Hand
Affiliations:
University of Cambridge Computer Laboratory;Microsoft Research Silicon Valley;University of Cambridge Computer Laboratory
Venue:
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Year:
2012

Citing 40
Cited 5

The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Towards characterizing cloud backend workloads: insights from Google compute clusters

ACM SIGMETRICS Performance Evaluation Review
FlumeJava: easy, efficient data-parallel pipelines

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Comet: batched stream processing for data intensive distributed computing

Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An Analysis of Traces from a Production MapReduce Cluster

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
CloudCmp: comparing public cloud providers

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Runtime measurements in the cloud: observing, analyzing, and reducing variance

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters

Proceedings of the sixth conference on Computer systems
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Nova: continuous Pig/Hadoop workflows

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Multicore OS benchmarks: we can do better

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Disk-locality in datacenter computing considered irrelevant

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
TidyFS: a simple and small distributed file system

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Hyracks: A flexible and extensible foundation for data-intensive computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating System

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Managing data transfers in computer clusters with orchestra

Proceedings of the ACM SIGCOMM 2011 conference
Modeling and synthesizing task placement constraints in Google compute clusters

Proceedings of the 2nd ACM Symposium on Cloud Computing
PrIter: a distributed framework for prioritized iterative computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
iMapReduce: A Distributed Computing Framework for Iterative Computation

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
iHadoop: Asynchronous Iterations for MapReduce

CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Nobody ever got fired for using Hadoop on a cluster

Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment

Heterogeneity and dynamicity of clouds at scale: Google trace analysis

Proceedings of the Third ACM Symposium on Cloud Computing
More for your money: exploiting performance heterogeneity in public clouds

Proceedings of the Third ACM Symposium on Cloud Computing
Performance implications of multi-tier application deployments on Infrastructure-as-a-Service clouds: Towards performance modeling

Future Generation Computer Systems
An untold story of redundant clouds: making your service deployment truly reliable

Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
A framework for an in-depth comparison of scale-up and scale-out

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research into distributed parallelism on "the cloud" has surged lately. As the research agenda and methodology in this area are being established, we observe a tendency towards certain common simplifications and shortcuts employed by researchers, which we provocatively term "sins". We believe that these sins, in some cases, are threats to the scientific integrity and practical applicability of the research presented. In this paper, we identify and discuss seven "deadly sins" (many of which we have ourselves committed!), present evidence illustrating that they pose real problems, and discuss ways for the community to avoid them in the future.