Scale and performance in a distributed file system
ACM Transactions on Computer Systems (TOCS)
Caching in the Sprite network file system
ACM Transactions on Computer Systems (TOCS)
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Chain replication for supporting high throughput and availability
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Advances in flash memory SSD technology for enterprise database applications
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Extreme scale with full SQL language support in microsoft SQL Azure
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Using Paxos to build a scalable, consistent, and highly available datastore
Proceedings of the VLDB Endowment
Windows Azure Storage: a highly available cloud storage service with strong consistency
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
bLSM: a general purpose log structured merge tree
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
bLSM: a general purpose log structured merge tree
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
COSBench: cloud object storage benchmark
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Hi-index | 0.00 |
Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!'s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage. In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.