Managing data transfers in computer clusters with orchestra

Authors:
Mosharaf Chowdhury;Matei Zaharia;Justin Ma;Michael I. Jordan;Ion Stoica
Affiliations:
University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the ACM SIGCOMM 2011 conference
Year:
2011

Citing 33
Cited 29

Cut problems and their application to divide-and-conquer

Approximation algorithms for NP-hard problems
SplitStream: high-bandwidth multicast in cooperative environments

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Modeling distances in large-scale networks by matrix factorization

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
On Cooperative Content Distribution and the Price of Barter

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
A clean slate 4D approach to network control and management

ACM SIGCOMM Computer Communication Review
Planet scale software updates

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Ethane: taking control of the enterprise

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
A policy-aware switching layer for data centers

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Dcell: a scalable and fault-tolerant network structure for data centers

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Antfarm: efficient content distribution with managed swarms

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
PortLand: a scalable fault-tolerant layer 2 data center network fabric

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
VL2: a scalable and flexible data center network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
BCube: a high performance, server-centric network architecture for modular data centers

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Safe and effective fine-grained TCP retransmissions for datacenter communication

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Understanding TCP incast throughput collapse in datacenter networks

Proceedings of the 1st ACM workshop on Research on enterprise networking
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Symbiotic routing in future data centers

Proceedings of the ACM SIGCOMM 2010 conference
Hedera: dynamic flow scheduling for data center networks

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Sharing the data center network

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Tesseract: a 4D network control plane

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Design and Evaluation of a Real-Time URL Spam Filtering Service

SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
A Survey on Network Coordinates Systems, Design, and Security

IEEE Communications Surveys & Tutorials
Multipoint communication: a survey of protocols, functions, and mechanisms

IEEE Journal on Selected Areas in Communications

Scaling the mobile millennium system in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
Bandwidth on demand for inter-data center communication

Proceedings of the 10th ACM Workshop on Hot Topics in Networks
NaaS: network-as-a-service in the cloud

Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Re-optimizing data-parallel computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
FairCloud: sharing the network in cloud computing

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Opening up black box networks with CloudTalk

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
A case for performance-centric network allocation

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads

Proceedings of the VLDB Endowment
FairCloud: sharing the network in cloud computing

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Torchestra: reducing interactive traffic delays over tor

Proceedings of the 2012 ACM workshop on Privacy in the electronic society
Coflow: a networking abstraction for cluster applications

Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Distributed adaptive routing for big-data applications running on data center networks

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Datacast: a scalable and efficient reliable group data delivery service for data centers

Proceedings of the 8th international conference on Emerging networking experiments and technologies
Bridging the gap between applications and networks in data centers

ACM SIGOPS Operating Systems Review
ClouDiA: a deployment advisor for public clouds

Proceedings of the VLDB Endowment
Sparkler: supporting large-scale matrix factorization

Proceedings of the 16th International Conference on Extending Database Technology
Revisiting flow-based load balancing: Stateless path selection in data center networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Effective straggler mitigation: attack of the clones

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
F10: a fault-tolerant engineered network

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Achieving high utilization with software-driven WAN

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Leveraging endpoint flexibility in data-intensive clusters

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
XORing elephants: novel erasure codes for big data

Proceedings of the VLDB Endowment
SPGM: an efficient algorithm for mapping MapReduce-like data-intensive applications in data centre network

International Journal of Web and Grid Services
The case for tiny tasks in compute clusters

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Supporting application-specific in-network processing in data centres

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Rethinking the physical layer of data center networks of the next decade: using optics to enable efficient *-cast connectivity

ACM SIGCOMM Computer Communication Review
Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters

Proceedings of the 4th annual Symposium on Cloud Computing
Apache Hadoop YARN: yet another resource negotiator

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.