LazyBase: trading freshness for performance in a scalable database

Authors:
James Cipar;Greg Ganger;Kimberly Keeton;Charles B. Morrey, III;Craig A.N. Soules;Alistair Veitch
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA
Venue:
Proceedings of the 7th ACM european conference on Computer Systems
Year:
2012

Citing 23
Cited 9

Applying update streams in a soft real-time database system

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Differential files: their application to the maintenance of large databases

ACM Transactions on Database Systems (TODS)
How to roll a join: asynchronous incremental view maintenance

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Database Technology for Decision Support Systems

Computer
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Indexing time vs. query time: trade-offs in dynamic information retrieval systems

Proceedings of the 14th ACM international conference on Information and knowledge management
B-tree indexes for high update rates

ACM SIGMOD Record
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
FAS: a freshness-sensitive coordination middleware for a cluster of OLAP components
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Write-optimized B-trees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Eventually consistent

Communications of the ACM - Rural engineering development
DataSeries: an efficient, flexible data format for structured serial data

ACM SIGOPS Operating Systems Review
RiTE: Providing On-Demand Data for Right-Time Data Warehousing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A common database approach for OLTP and OLAP using an in-memory column database

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Open CirrusTMcloud computing testbed: federated data centers for open source systems and services research

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
@spam: the underground on 140 characters or less

Proceedings of the 17th ACM conference on Computer and communications security
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Serving large-scale batch computed data with project Voldemort

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies

HyperDex: a distributed, searchable key-value store

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
HyperDex: a distributed, searchable key-value store

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Workload diversity and dynamics in big data analytics: implications to system designers

Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Optimistic crash consistency

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Transaction chains: achieving serializability with low latency in geo-distributed storage systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
QuiltView: a crowd-sourced video response system

Proceedings of the 15th Workshop on Mobile Computing Systems and Applications
From research to practice: experiences engineering a production metadata database for a scale out file system

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Blizzard: fast, cloud-scale block storage for cloud-oblivious applications

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The LazyBase scalable database system is specialized for the growing class of data analysis applications that extract knowledge from large, rapidly changing data sets. It provides the scalability of popular NoSQL systems without the query-time complexity associated with their eventual consistency models, offering a clear consistency model and explicit per-query control over the trade-off between latency and result freshness. With an architecture designed around batching and pipelining of updates, LazyBase simultaneously ingests atomic batches of updates at a very high throughput and offers quick read queries to a stale-but-consistent version of the data. Although slightly stale results are sufficient for many analysis queries, fully up-to-date results can be obtained when necessary by also scanning updates still in the pipeline. Compared to the Cassandra NoSQL system, LazyBase provides 4X--5X faster update throughput and 4X faster read query throughput for range queries while remaining competitive for point queries. We demonstrate LazyBase's tradeoff between query latency and result freshness as well as the benefits of its consistency model. We also demonstrate specific cases where Cassandra's consistency model is weaker than LazyBase's.