Walnut: a unified cloud object store

Authors:
Jianjun Chen;Chris Douglas;Michi Mutsuzaki;Patrick Quaid;Raghu Ramakrishnan;Sriram Rao;Russell Sears
Affiliations:
Yahoo!, Santa Clara, CA, USA;Yahoo!, Santa Clara, USA;Yahoo!, Santa Clara, USA;Yahoo!, Sunnyvale, USA;Yahoo!, Santa Clara, USA;Yahoo!, Santa Clara, USA;Yahoo!, Santa Clara, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 18
Cited 2

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Caching in the Sprite network file system

ACM Transactions on Computer Systems (TOCS)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Chain replication for supporting high throughput and availability

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Advances in flash memory SSD technology for enterprise database applications

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Extreme scale with full SQL language support in microsoft SQL Azure

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Using Paxos to build a scalable, consistent, and highly available datastore

Proceedings of the VLDB Endowment
Windows Azure Storage: a highly available cloud storage service with strong consistency

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
bLSM: a general purpose log structured merge tree

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

bLSM: a general purpose log structured merge tree

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
COSBench: cloud object storage benchmark

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!'s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage. In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.