Approximation algorithms for scheduling unrelated parallel machines
Mathematical Programming: Series A and B
`` Strong '' NP-Completeness Results: Motivation, Examples, and Implications
Journal of the ACM (JACM)
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Chain replication for supporting high throughput and availability
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
OASIS: anycast for any service
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Dcell: a scalable and fault-tolerant network structure for data centers
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
PortLand: a scalable fault-tolerant layer 2 data center network fabric
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
VL2: a scalable and flexible data center network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
BCube: a high performance, server-centric network architecture for modular data centers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The nature of data center traffic: measurements & analysis
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hedera: dynamic flow scheduling for data center networks
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Apache hadoop goes realtime at Facebook
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
Managing data transfers in computer clusters with orchestra
Proceedings of the ACM SIGCOMM 2011 conference
Windows Azure Storage: a highly available cloud storage service with strong consistency
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
MicroTE: fine grained traffic engineering for data centers
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Surviving failures in bandwidth-constrained datacenters
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Hierarchical policies for software defined networks
Proceedings of the first workshop on Hot topics in software defined networks
Erasure coding in windows azure storage
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
XORing elephants: novel erasure codes for big data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster. We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.