bLSM: a general purpose log structured merge tree

Authors:
Russell Sears;Raghu Ramakrishnan
Affiliations:
Yahoo!, Santa Clara, CA, USA;Yahoo!, Santa Clara, CA, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 25
Cited 11

The design and implementation of a log-structured file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The log-structured merge-tree (LSM-tree)

Acta Informatica
The five-minute rule ten years later, and other computer storage rules of thumb

ACM SIGMOD Record
The LHAM log-structured history data access method

The VLDB Journal — The International Journal on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
FlashDB: dynamic self-tuning database for NAND flash

Proceedings of the 6th international conference on Information processing in sensor networks
Cache-oblivious streaming B-trees

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Microhash: an efficient index structure for fash-based sensor devices

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Stasis: flexible transactional storage

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
File system logging versus clustering: a performance comparison

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Less hashing, same performance: building a better bloom filter

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
The partitioned exponential file for database storage management

The VLDB Journal — The International Journal on Very Large Data Bases
Enterprise SSDs

Queue - Enterprise Flash Storage
Rose: compressed, log-structured replication

Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Segment-based recovery: write-ahead logging revisited

Proceedings of the VLDB Endowment
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Tree indexing on solid state drives

Proceedings of the VLDB Endowment
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Don't thrash: how to cache your hash on flash

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
An efficient multi-tier tablet server storage architecture

Proceedings of the 2nd ACM Symposium on Cloud Computing
SILT: a memory-efficient, high-performance key-value store

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Walnut: a unified cloud object store

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Walnut: a unified cloud object store

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Mobius: unified messaging and data serving for mobile apps

Proceedings of the 10th international conference on Mobile systems, applications, and services
LogBase: a scalable log-structured database system in the cloud

Proceedings of the VLDB Endowment
A practical concurrent index for solid-state drives

Proceedings of the 21st ACM international conference on Information and knowledge management
BTRFS: The Linux B-Tree Filesystem

ACM Transactions on Storage (TOS)
Read optimisations for append storage on flash

Proceedings of the 17th International Database Engineering & Applications Symposium
Memory-efficient groupby-aggregate using compressed buffer trees

Proceedings of the 4th annual Symposium on Cloud Computing
TABLEFS: enhancing metadata efficiency in the local file system

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
LLAMA: a cache/storage subsystem for modern hardware

Proceedings of the VLDB Endowment
Making updates disk-I/O friendly using SSDs

Proceedings of the VLDB Endowment
Toward a scale-out data-management middleware for low-latency enterprise computing

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. In contrast, existing log structured techniques improve write throughput but sacrifice read performance and exhibit unacceptable latency spikes. We begin by presenting a new performance metric: read fanout, and argue that, with read and write amplification, it better characterizes real-world indexes than approaches such as asymptotic analysis and price/performance. We then present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches: (1) Unlike existing log structured trees, bLSM has near-optimal read and scan performance, and (2) its new "spring and gear" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time. It does this by ensuring merges at each level of the tree make steady progress without resorting to techniques that degrade read performance. We use Bloom filters to improve index performance, and find a number of subtleties arise. First, we ensure reads can stop after finding one version of a record. Otherwise, frequently written items would incur multiple B-Tree lookups. Second, many applications check for existing values at insert. Avoiding the seek performed by the check is crucial.