Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
SETI@home: an experiment in public-resource computing
Communications of the ACM
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Maestro: quality-of-service in large disk arrays
Proceedings of the 8th ACM international conference on Autonomic computing
Managing data transfers in computer clusters with orchestra
Proceedings of the ACM SIGCOMM 2011 conference
Warehouse-Scale Computing: Entering the Teenage Decade
Proceedings of the 38th annual international symposium on Computer architecture
Improving speedup and response times by replicating parallel programs on a SNOW
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis
Proceedings of the 7th ACM european conference on Computer Systems
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
Proceedings of the Third ACM Symposium on Cloud Computing
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
The case for tiny tasks in compute clusters
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Limplock: understanding the impact of limpware on scale-out cloud systems
Proceedings of the 4th annual Symposium on Cloud Computing
Quasar: resource-efficient and QoS-aware cluster management
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GRASS: trimming stragglers in approximation analytics
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8 times slower than themedian task in that job. Such stragglers increase the average job duration by 47%. This is because current mitigation techniques all involve an element of waiting and speculation. We instead propose full cloning of small jobs, avoiding waiting and speculation altogether. Cloning of small jobs only marginally increases utilization because workloads show that while the majority of jobs are small, they only consume a small fraction of the resources. The main challenge of cloning is, however, that extra clones can cause contention for intermediate data. We use a technique, delay assignment, which efficiently avoids such contention. Evaluation of our system, Dolly, using production workloads shows that the small jobs speedup by 34% to 46% after state-of-the-artmitigation techniques have been applied, using just 5% extra resources for cloning.