Experiences with the Amoeba distributed operating system
Communications of the ACM
Transparent process migration: design alternatives and the sprite implementation
Software—Practice & Experience
Trace-driven modeling and analysis of CPU scheduling in a multiprogramming system
Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Managing data transfers in computer clusters with orchestra
Proceedings of the ACM SIGCOMM 2011 conference
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Load Balancing in MapReduce Based on Scalable Cardinality Estimates
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Communications of the ACM
Omega: flexible, scalable schedulers for large compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Sparrow: distributed, low latency scheduling
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
GRASS: trimming stragglers in approximation analytics
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
We argue for breaking data-parallel jobs in compute clusters into tiny tasks that each complete in hundreds of milliseconds. Tiny tasks avoid the need for complex skew mitigation techniques: by breaking a large job into millions of tiny tasks, work will be evenly spread over available resources by the scheduler. Furthermore, tiny tasks alleviate long wait times seen in today's clusters for interactive jobs: even large batch jobs can be split into small tasks that finish quickly. We demonstrate a 5.2× improvement in response times due to the use of smaller tasks. In current data-parallel computing frameworks, high task launch overheads and scalability limitations prevent users from running short tasks. Recent research has addressed many of these bottlenecks; we discuss remaining challenges and propose a task execution framework that can efficiently support tiny tasks.