The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Towards characterizing cloud backend workloads: insights from Google compute clusters
ACM SIGMETRICS Performance Evaluation Review
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Comet: batched stream processing for data intensive distributed computing
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An Analysis of Traces from a Production MapReduce Cluster
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
CloudCmp: comparing public cloud providers
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Runtime measurements in the cloud: observing, analyzing, and reducing variance
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Multicore OS benchmarks: we can do better
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Disk-locality in datacenter computing considered irrelevant
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
TidyFS: a simple and small distributed file system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating System
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Managing data transfers in computer clusters with orchestra
Proceedings of the ACM SIGCOMM 2011 conference
Modeling and synthesizing task placement constraints in Google compute clusters
Proceedings of the 2nd ACM Symposium on Cloud Computing
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
iMapReduce: A Distributed Computing Framework for Iterative Computation
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
iHadoop: Asynchronous Iterations for MapReduce
CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Nobody ever got fired for using Hadoop on a cluster
Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
Proceedings of the Third ACM Symposium on Cloud Computing
More for your money: exploiting performance heterogeneity in public clouds
Proceedings of the Third ACM Symposium on Cloud Computing
Future Generation Computer Systems
An untold story of redundant clouds: making your service deployment truly reliable
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
A framework for an in-depth comparison of scale-up and scale-out
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hi-index | 0.00 |
Research into distributed parallelism on "the cloud" has surged lately. As the research agenda and methodology in this area are being established, we observe a tendency towards certain common simplifications and shortcuts employed by researchers, which we provocatively term "sins". We believe that these sins, in some cases, are threats to the scientific integrity and practical applicability of the research presented. In this paper, we identify and discuss seven "deadly sins" (many of which we have ourselves committed!), present evidence illustrating that they pose real problems, and discuss ways for the community to avoid them in the future.