YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Authors:
Swapnil Patil;Milo Polte;Kai Ren;Wittawat Tantisiriroj;Lin Xiao;Julio López;Garth Gibson;Adam Fuchs;Billie Rinaldi
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;National Security Agency;National Security Agency
Venue:
Proceedings of the 2nd ACM Symposium on Cloud Computing
Year:
2011

Citing 25
Cited 13

Parallel database systems: the future of high performance database systems

Communications of the ACM
The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
The log-structured merge-tree (LSM-tree)

Acta Informatica
Towards robust distributed systems (abstract)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services

ACM SIGACT News
Active Disks for Large-Scale Data Processing

Computer
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
B-tree indexes for high update rates

ACM SIGMOD Record
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Beyond relational databases

Communications of the ACM - Web science
Data management projects at Google

ACM SIGMOD Record
Efficient bulk insertion into a distributed ordered table

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Eventually Consistent

Queue - Scalable Web Services
Consistency rationing in the cloud: pay only when it matters

Proceedings of the VLDB Endowment
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Consistability: describing usually consistent systems

HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Parallel bulk insertion for large-scale analytics applications

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
CloudCmp: comparing public cloud providers

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Comet: an active distributed key-value store

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Fast loads and queries

Transactions on large-scale data- and knowledge-centered systems II
Otus: resource attribution in data-intensive clusters

Proceedings of the second international workshop on MapReduce and its applications
Consistency models for replicated data

Replication

Gumball: a race condition prevention technique for cache augmented SQL database management systems

DBSocial '12 Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks
Rya: a scalable RDF triple store for the clouds

Proceedings of the 1st International Workshop on Cloud Intelligence
Solving big data challenges for enterprise application performance management

Proceedings of the VLDB Endowment
Big data benchmarking

Proceedings of the 2012 workshop on Management of big data systems
Toward a principled framework for benchmarking consistency

HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Position paper: cloud system deployment and performance evaluation tools for distributed databases

Proceedings of the 2013 international workshop on Hot topics in cloud services
BigBench: towards an industry standard benchmark for big data analytics

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
D-Zipfian: a decentralized implementation of Zipfian

Proceedings of the Sixth International Workshop on Testing Database Systems
Expedited rating of data stores using agile data loading techniques

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Limplock: understanding the impact of limpware on scale-out cloud systems

Proceedings of the 4th annual Symposium on Cloud Computing
Client-centric benchmarking of eventual consistency for cloud storage systems

Proceedings of the 4th annual Symposium on Cloud Computing
Eventually consistent: not what you were expecting?

Communications of the ACM
Eventually Consistent: Not What You Were Expecting?

Queue - Performance

Quantified Score

Hi-index	0.02

Visualization

Abstract

Inspired by Google's BigTable, a variety of scalable, semi-structured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and finegrained security.