Expedited rating of data stores using agile data loading techniques

Authors:
Sumita Barahmand;Shahram Ghandeharizadeh
Affiliations:
University of Southern California, Los Angeles, CA, USA;University of Southern California, Los Angeles, CA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 16
Cited 1

The 007 Benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Bulk Loading into an OODB: A Performance Study

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
OODB Bulk Loading Revisited: The Partitioned-List Approach

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Recovery from "bad" user transactions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Eventually consistent

Communications of the ACM - Rural engineering development
Improving Transaction-Time DBMS Performance and Functionality

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Scalable SQL and NoSQL data stores

ACM SIGMOD Record
YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Proceedings of the 2nd ACM Symposium on Cloud Computing
A trigger-based middleware cache for ORMs

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
TAO: how facebook serves the social graph

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Gumball: a race condition prevention technique for cache augmented SQL database management systems

DBSocial '12 Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks
D-Zipfian: a decentralized implementation of Zipfian

Proceedings of the Sixth International Workshop on Testing Database Systems
A comparison of two physical data designs for interactive social networking actions

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

A comparison of two physical data designs for interactive social networking actions

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

To benchmark and rate a data store, one must repeat experiments that impose a different amount of load on the data store. Workloads that modify the benchmark database may require the same database to be loaded repeatedly. This may constitute a significant portion of the time to rate a data store. This paper presents several agile data loading techniques to expedite the rating process. These techniques include generating the disk image of the database once and re-using it, restoring the updated data items to their original value, maintaining in-memory state of the database across different experiments to avoid repeated loading of the database all together, and a hybrid of the third technique in combination with the other two. These techniques are general purpose and apply to a variety of cloud benchmarks. We investigate their implementation and evaluation in the context of one, the BG benchmark. Obtained results show a factor of two to twelve speedup in the rating process. As an example, when evaluating MongoDB with a million member BG database, we show these techniques expedite BG's rating from 4 months (123 days) of continuous running to less than 11 days for the first rating experiment. Subsequent ratings of MongoDB with different workloads using the same database is much faster, in the order of hours.