Hot data identification for flash-based storage systems using multiple bloom filters

Authors:
Dongchul Park;David H. C. Du
Affiliations:
Department of Computer Science and Engineering, University of Minnesota, Twin Cities, Minneapolis, 55455, USA;Department of Computer Science and Engineering, University of Minnesota, Twin Cities, Minneapolis, 55455, USA
Venue:
MSST '11 Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies
Year:
2011

Citing 0
Cited 8

An on-line hot data identification for flash-based storage using sampling mechanism

ACM SIGAPP Applied Computing Review
An empirical study of hot/cold data separation policies in solid state drives (SSDs)

Proceedings of the 6th International Systems and Storage Conference
SAW: system-assisted wear leveling on the write endurance of NAND flash devices

Proceedings of the 50th Annual Design Automation Conference
Keyword oriented bitmap join index for in-memory analytical processing

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
I/O stack optimization for smartphones

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
EagleTree: exploring the design space of SSD-based algorithms

Proceedings of the VLDB Endowment
Improving flash write performance by using update frequency

Proceedings of the VLDB Endowment
Analytic Models of SSD Write Performance

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hot data identification can be applied to a variety of fields. Particularly in flash memory, it has a critical impact on its performance (due to a garbage collection) as well as its life span (due to a wear leveling). Although the hot data identification is an issue of paramount importance in flash memory, little investigation has been made. Moreover, all existing schemes focus almost exclusively on a frequency viewpoint. However, recency also must be considered equally with the frequency for effective hot data identification. In this paper, we propose a novel hot data identification scheme adopting multiple bloom filters to efficiently capture finer-grained recency as well as frequency. In addition to this scheme, we propose a Window-based Direct Address Counting (WDAC) algorithm to approximate an ideal hot data identification as our baseline. Unlike the existing baseline algorithm that cannot appropriately capture recency information due to its exponential batch decay, our WDAC algorithm, using a sliding window concept, can capture very fine-grained recency information. Our experimental evaluation with diverse realistic workloads including real SSD traces demonstrates that our multiple bloom filter-based scheme outperforms the state-of-the-art scheme. In particular, ours not only consumes 50% less memory and requires less computational overhead up to 58%, but also improves its performance up to 65%.