Cut problems and their application to divide-and-conquer
Approximation algorithms for NP-hard problems
SplitStream: high-bandwidth multicast in cooperative environments
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Modeling distances in large-scale networks by matrix factorization
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
On Cooperative Content Distribution and the Price of Barter
ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
A clean slate 4D approach to network control and management
ACM SIGCOMM Computer Communication Review
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Ethane: taking control of the enterprise
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
A policy-aware switching layer for data centers
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Dcell: a scalable and fault-tolerant network structure for data centers
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Antfarm: efficient content distribution with managed swarms
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
PortLand: a scalable fault-tolerant layer 2 data center network fabric
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
VL2: a scalable and flexible data center network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
BCube: a high performance, server-centric network architecture for modular data centers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Safe and effective fine-grained TCP retransmissions for datacenter communication
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Understanding TCP incast throughput collapse in datacenter networks
Proceedings of the 1st ACM workshop on Research on enterprise networking
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Symbiotic routing in future data centers
Proceedings of the ACM SIGCOMM 2010 conference
Hedera: dynamic flow scheduling for data center networks
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Sharing the data center network
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Tesseract: a 4D network control plane
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Design and Evaluation of a Real-Time URL Spam Filtering Service
SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
A Survey on Network Coordinates Systems, Design, and Security
IEEE Communications Surveys & Tutorials
Multipoint communication: a survey of protocols, functions, and mechanisms
IEEE Journal on Selected Areas in Communications
Scaling the mobile millennium system in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
Bandwidth on demand for inter-data center communication
Proceedings of the 10th ACM Workshop on Hot Topics in Networks
NaaS: network-as-a-service in the cloud
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
FairCloud: sharing the network in cloud computing
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Opening up black box networks with CloudTalk
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
A case for performance-centric network allocation
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
FairCloud: sharing the network in cloud computing
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Torchestra: reducing interactive traffic delays over tor
Proceedings of the 2012 ACM workshop on Privacy in the electronic society
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Distributed adaptive routing for big-data applications running on data center networks
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Datacast: a scalable and efficient reliable group data delivery service for data centers
Proceedings of the 8th international conference on Emerging networking experiments and technologies
Bridging the gap between applications and networks in data centers
ACM SIGOPS Operating Systems Review
ClouDiA: a deployment advisor for public clouds
Proceedings of the VLDB Endowment
Sparkler: supporting large-scale matrix factorization
Proceedings of the 16th International Conference on Extending Database Technology
Revisiting flow-based load balancing: Stateless path selection in data center networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
F10: a fault-tolerant engineered network
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Achieving high utilization with software-driven WAN
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Leveraging endpoint flexibility in data-intensive clusters
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
XORing elephants: novel erasure codes for big data
Proceedings of the VLDB Endowment
International Journal of Web and Grid Services
The case for tiny tasks in compute clusters
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Supporting application-specific in-network processing in data centres
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
ACM SIGCOMM Computer Communication Review
Proceedings of the 4th annual Symposium on Cloud Computing
Apache Hadoop YARN: yet another resource negotiator
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.