Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
An efficient algorithm for sequential random sampling
ACM Transactions on Mathematical Software (TOMS)
Skip lists: a probabilistic alternative to balanced trees
Communications of the ACM
Random sampling from hash files
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The log-structured merge-tree (LSM-tree)
Acta Informatica
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
A Novel Index Supporting High Volume Data Warehouse Insertion
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
PicoDMBS: Scaling Down Database Techniques for the Smartcard
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An efficient R-tree implementation over flash-memory storage systems
GIS '03 Proceedings of the 11th ACM international symposium on Advances in geographic information systems
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Capsule: an energy-optimized object storage system for memory-constrained sensor devices
Proceedings of the 4th international conference on Embedded networked sensor systems
FlashDB: dynamic self-tuning database for NAND flash
Proceedings of the 6th international conference on Information processing in sensor networks
A design for high-performance flash disks
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Design of flash-based DBMS: an in-page logging approach
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Microhash: an efficient index structure for fash-based sensor devices
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Storage alternatives for mobile computers
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Block recycling schemes and their cost-based optimization in nand flash memory based storage system
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
BPLRU: a buffer management scheme for improving random writes in flash storage
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Design tradeoffs for SSD performance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A space-efficient flash translation layer for CompactFlash systems
IEEE Transactions on Consumer Electronics
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Block storage virtualization with commodity secure digital cards
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
A group round robin based b-tree index storage scheme for flash memory devices
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Recent advances in flash storage have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA's, laptops, and even servers. However, flash storage has many unique characteristics that make existing data management/analytics algorithms designed for magnetic disks perform poorly with flash storage. For example, while random reads can be nearly as fast as sequential reads, random writes and in-place data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for flash storage: efficiently maintaining a very large random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric file are not readily adapted to flash. Second, we propose B-File, an energy-efficient abstraction for flash storage to store self-expiring items, and show how a B-File can be used to efficiently maintain a large sample in flash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with flash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-File and to query the large sample stored in a B-File for a subsample of an arbitrary size. Finally, we present an evaluation with flash storage that shows our techniques are several orders of magnitude faster and more energy-efficient than (flash-friendly versions of) reservoir sampling and geometric file. A key finding of our study, of potential use to many flash algorithms beyond sampling, is that "semi-random" writes (as defined in the paper) on flash cards are over two orders of magnitude faster and more energy-efficient than random writes.