Caching in the Sprite network file system
ACM Transactions on Computer Systems (TOCS)
The Zebra striped network file system
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Serverless network file systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Removal policies in network caches for World-Wide Web documents
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Main Memory Database Systems: An Overview
IEEE Transactions on Knowledge and Data Engineering
Role of Aging, Frequency, and Size in Web Cache Replacement Policies
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Global Memory Management in Client-Server Database Architectures
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Connection scheduling in web servers
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Cost-aware WWW proxy caching algorithms
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
The case for RAMClouds: scalable high-performance storage entirely in DRAM
ACM SIGOPS Operating Systems Review
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Nobody ever got fired for using Hadoop on a cluster
Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Why let resources idle? aggressive cloning of jobs with dolly
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MixApart: decoupled analytics for shared storage systems
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Cake: enabling high-level SLOs on shared storage systems
Proceedings of the Third ACM Symposium on Cloud Computing
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Metadata Traces and Workload Models for Evaluating Big Storage Systems
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Choosy: max-min fair sharing for datacenter jobs with constraints
Proceedings of the 8th ACM European Conference on Computer Systems
A throughput optimal algorithm for map task scheduling in mapreduce with data locality
ACM SIGMETRICS Performance Evaluation Review
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Octopus: efficient data intensive computing on virtualized datacenters
Proceedings of the 6th International Systems and Storage Conference
Leveraging endpoint flexibility in data-intensive clusters
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
A case for dynamic memory partitioning in data centers
Proceedings of the Second Workshop on Data Analytics in the Cloud
CooMR: cross-task coordination for efficient data management in MapReduce programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 4th annual Symposium on Cloud Computing
Scale-up vs scale-out for Hadoop: time to rethink?
Proceedings of the 4th annual Symposium on Cloud Computing
Joint optimization of overlapping phases in MapReduce
Performance Evaluation
Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads
Proceedings of the VLDB Endowment
REEF: retainable evaluator execution framework
Proceedings of the VLDB Endowment
MixApart: decoupled analytics for shared storage systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
GRASS: trimming stragglers in approximation analytics
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Data-intensive analytics on large clusters is important for modern Internet services. As machines in these clusters have large memories, in-memory caching of inputs is an effective way to speed up these analytics jobs. The key challenge, however, is that these jobs run multiple tasks in parallel and a job is sped up only when inputs of all such parallel tasks are cached. Indeed, a single task whose input is not cached can slow down the entire job. To meet this "all-or-nothing" property, we have built PACMan, a caching service that coordinates access to the distributed caches. This coordination is essential to improve job completion times and cluster efficiency. To this end, we have implemented two cache replacement policies on top of PACMan's coordinated infrastructure fb-LIFE that minimizes average completion time by evicting large incomplete inputs, and LFU-F that maximizes cluster efficiency by evicting less frequently accessed inputs. Evaluations on production workloads from Facebook and Microsoft Bing show that PACMan reduces average completion time of jobs by 53% and 51% (small interactive jobs improve by 77%), and improves efficiency of the cluster by 47% and 54%, respectively.