Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Proceedings of the 27th annual international symposium on Computer architecture
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Improving NAND Flash Based Disk Caches
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Design tradeoffs for SSD performance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Migrating server storage to SSDs: analysis of tradeoffs
Proceedings of the 4th ACM European conference on Computer systems
Proceedings of the 36th annual international symposium on Computer architecture
FlashSim: A Simulator for NAND Flash-Based Solid-State Drives
SIMUL '09 Proceedings of the 2009 First International Conference on Advances in System Simulation
Exploiting Internal Parallelism of Flash-based SSDs
IEEE Computer Architecture Letters
RAF: A Random Access First Cache Management to Improve SSD-Based Disk Cache
NAS '10 Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage
SSD bufferpool extensions for database systems
Proceedings of the VLDB Endowment
Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Design of a large-scale storage-class RRAM system
Proceedings of the 27th international ACM conference on International conference on supercomputing
Revisiting widely held SSD expectations and rethinking system-level implications
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Exploring the future of out-of-core computing with compute-local non-volatile memory
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Challenges in getting flash drives closer to CPU
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
NAND flash storage has proven to be a competitive alternative to traditional disk for its properties of high random-access speeds, low-power and its presumed efficacy for random-reads. Ironically, we demonstrate that when packaged in SSD format, there arise many barriers to reaching full parallelism in reads, resulting in random writes out-performing them. Motivated by this, we propose Physically Addressed Queuing (PAQ), a request scheduler that avoids resource contention resultant from shared SSD resources. PAQ makes the following major contributions: First, it exposes the physical addresses of requests to the scheduler. Second, I/O clumping is utilized to select groups of operations that can be simultaneously executed without major resource conflict. Third, inter-request NAND transaction packing empowers multi-plane-mode operations. We implement PAQ in a cycle-accurate simulator and demonstrate bandwidth and IOPS improvements greater than 62% and latency decreases as much as 41.6% for random reads, without degrading performance of other access types.