Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
An efficient algorithm for sequential random sampling
ACM Transactions on Mathematical Software (TOMS)
Skip lists: a probabilistic alternative to balanced trees
Communications of the ACM
Random sampling from hash files
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The log-structured merge-tree (LSM-tree)
Acta Informatica
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
A Novel Index Supporting High Volume Data Warehouse Insertion
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An efficient R-tree implementation over flash-memory storage systems
GIS '03 Proceedings of the 11th ACM international symposium on Advances in geographic information systems
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Capsule: an energy-optimized object storage system for memory-constrained sensor devices
Proceedings of the 4th international conference on Embedded networked sensor systems
FlashDB: dynamic self-tuning database for NAND flash
Proceedings of the 6th international conference on Information processing in sensor networks
A design for high-performance flash disks
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Design of flash-based DBMS: an in-page logging approach
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Microhash: an efficient index structure for fash-based sensor devices
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Storage alternatives for mobile computers
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Block recycling schemes and their cost-based optimization in nand flash memory based storage system
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
BPLRU: a buffer management scheme for improving random writes in flash storage
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Design tradeoffs for SSD performance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Query processing techniques for solid state drives
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
FlashLogging: exploiting flash devices for synchronous logging performance
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
GAMPS: compressing multi sensor data by grouping and amplitude scaling
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Energy efficient sensor data logging with amnesic flash storage
IPSN '09 Proceedings of the 2009 International Conference on Information Processing in Sensor Networks
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Recovery of flash memories for reliable mobile storages
Mobile Information Systems
Cheap and large CAMs for high performance data-intensive networked systems
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
FlashStore: high throughput persistent key-value store
Proceedings of the VLDB Endowment
FAWN: a fast array of wimpy nodes
Communications of the ACM
SkimpyStash: RAM space skimpy key-value store on flash-based storage
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Data management over flash memory
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An FTL-agnostic layer to improve random write on flash memory
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
FAST: a generic framework for flash-aware spatial trees
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Designing a flash-aware two-level cache
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
SILT: a memory-efficient, high-performance key-value store
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
HybridStore: an efficient data management system for hybrid flash-based sensor devices
EWSN'13 Proceedings of the 10th European conference on Wireless Sensor Networks
Hi-index | 0.02 |
Recent advances in flash media have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA's, laptops, and even servers. However, flash media has many unique characteristics that make existing data management/analytics algorithms designed for magnetic disks perform poorly with flash storage. For example, while random (page) reads are as fast as sequential reads, random (page) writes and in-place data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for flash storage: efficiently maintaining a very large (100 MBs or more) random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric file are not readily adapted to flash. Second, we propose B-FILE, an energy-efficient abstraction for flash media to store self-expiring items, and show how a B-FILE can be used to efficiently maintain a large sample in flash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with flash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-FILE and to query the large sample stored in a B-FILE for a subsample of an arbitrary size. Finally, we present an evaluation with flash media that shows our techniques are several orders of magnitude faster and more energy-efficient than (flash-friendly versions of) reservoir sampling and geometric file. A key finding of our study, of potential use to many flash algorithms beyond sampling, is that "semi-random" writes (as defined in the paper) on flash cards are over two orders of magnitude faster and more energy-efficient than random writes.