Efficient fair queueing using deficit round robin
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Increasing Predictive Accuracy by Prefetching Multiple Program and User Specific Files
HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Variable-length contexts for PPM
DCC '04 Proceedings of the Conference on Data Compression
Database replication policies for dynamic content applications
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Flexible, wide-area storage for distributed systems with WheelFS
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
PADS: a policy architecture for distributed storage systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
VL2: a scalable and flexible data center network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The nature of data center traffic: measurements & analysis
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency
Proceedings of the first ACM SIGCOMM workshop on Green networking
Providing a cloud network infrastructure on a supercomputer
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
SRCMap: energy proportional storage using dynamic consolidation
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Everest: scaling down peak loads through I/O off-loading
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Nectar: automatic management of data and computation in datacenters
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Disk-locality in datacenter computing considered irrelevant
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis
Proceedings of the 7th ACM european conference on Computer Systems
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
The HaLoop approach to large-scale iterative data analysis
The VLDB Journal — The International Journal on Very Large Data Bases
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Maestro: Replica-Aware Map Scheduling for MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Why let resources idle? aggressive cloning of jobs with dolly
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
MRBS: towards dependability benchmarking for hadoop mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
DBalancer: distributed load balancing for NoSQL data-stores
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A throughput optimal algorithm for map task scheduling in mapreduce with data locality
ACM SIGMETRICS Performance Evaluation Review
Octopus: efficient data intensive computing on virtualized datacenters
Proceedings of the 6th International Systems and Storage Conference
Leveraging endpoint flexibility in data-intensive clusters
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
The case for tiny tasks in compute clusters
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even when there are machines with spare cycles in the cluster. To address this problem, we present Scarlett, a system that replicates blocks based on their popularity. By accurately predicting file popularity and working within hard bounds on additional storage, Scarlett causes minimal interference to running jobs. Trace driven simulations and experiments in two popular MapReduce frameworks (Hadoop, Dryad) show that Scarlett effectively alleviates hotspots and can speed up jobs by 20.2%.