Janus: optimal flash provisioning for cloud storage workloads

Authors:
Christoph Albrecht;Arif Merchant;Murray Stokely;Muhammad Waliji;François Labelle;Nate Coehlo;Xudong Shi;C. Eric Schrock
Affiliations:
Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.
Venue:
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Year:
2013

Citing 18
Cited 0

The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Caching in large-scale distributed file systems

Caching in large-scale distributed file systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The HP AutoRAID hierarchical storage system

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
A trace-driven analysis of the UNIX 4.2 BSD file system

Proceedings of the tenth ACM symposium on Operating systems principles
Minerva: An automated resource provisioning tool for large-scale storage systems

ACM Transactions on Computer Systems (TOCS)
Design and Implementation of a Predictive File Prefetching Algorithm

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Quickly finding near-optimal storage designs

ACM Transactions on Computer Systems (TOCS)
Improving Recoverability in Multi-tier Storage Systems

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Migrating server storage to SSDs: analysis of tradeoffs

Proceedings of the 4th ACM European conference on Computer systems
GFS: evolution on fast-forward

Communications of the ACM
Cost effective storage using extent based dynamic tiering

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Cloud resource usage: extreme distributions invalidating traditional capacity planning models

Proceedings of the 2nd international workshop on Scientific cloud computing
Caching less for better performance: balancing cache size and update cost of flash memory cache in hybrid storage systems

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Projecting disk usage based on historical trends in a cloud environment

Proceedings of the 3rd workshop on Scientific Cloud Computing Date

Quantified Score

Hi-index	0.00

Visualization

Abstract

Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) policy or a Least-Recently-Used (LRU) policy, subject to per-workload allocations. Janus constructs compact metrics of the cacheability of the different workloads, using sampled distributed traces because of the large scale of the system. From these metrics, we formulate and solve an optimization problem to determine the flash allocation to workloads that maximizes the total reads sent to the flash tier, subject to operator-set priorities and bounds on flash write rates. Using measurements from production workloads in multiple data centers using these recommendations, as well as traces of other production workloads, we show that the resulting allocation improves the flash hit rate by 47-76% compared to a unified tier shared by all workloads. Based on these results and an analysis of several thousand production workloads, we conclude that flash storage is a cost-effective complement to disks in data centers.