An MPI Library which uses Polling, Interrupts and Remote Copying for the Fujitsu AP1000+
ISPAN '96 Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks
Understanding The Linux Kernel
Understanding The Linux Kernel
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Linux Device Drivers, 3rd Edition
Linux Device Drivers, 3rd Edition
Phase-change random access memory: a scalable technology
IBM Journal of Research and Development
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Better I/O through byte-addressable, persistent memory
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SCMFS: a file system for storage class memory
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Exploiting peak device throughput from random access workload
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Smart layers and dumb result: IO characterization of an android-based smartphone
Proceedings of the tenth ACM international conference on Embedded software
Optimizing storage performance of Android smartphone
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Beyond block I/O: implementing a distributed shared log in hardware
Proceedings of the 6th International Systems and Storage Conference
Linux block IO: introducing multi-queue SSD access on multi-core systems
Proceedings of the 6th International Systems and Storage Conference
QuickSAN: a storage area network for fast, distributed, solid state disks
Proceedings of the 40th Annual International Symposium on Computer Architecture
Unified high-performance I/O: one stack to rule them all
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Optimizing the file system with variable-length I/O for fast storage devices
Proceedings of the 4th Asia-Pacific Workshop on Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
From ARIES to MARS: transaction support for next-generation, solid-state drives
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Bankshot: caching slow storage in fast non-volatile memory
Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
Dynamic interval polling and pipelined post I/O processing for low-latency storage class memory
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
I/O stack optimization for smartphones
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Improving writeback performance of memory-based storage devices
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Strata: scalable high-performance storage on virtualized non-volatile memory
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
DC express: shortest latency protocol for reading phase change memory over PCI express
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
NetVM: high performance and flexible networking using virtualization on commodity platforms
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
In a traditional block I/O path, the operating system completes virtually all I/Os asynchronously via interrupts. However, performing storage I/O with ultra-low latency devices using next-generation non-volatile memory, it can be shown that polling for the completion - hence wasting clock cycles during the I/O - delivers higher performance than traditional interrupt-driven I/O. This paper thus argues for the synchronous completion of block I/O first by presenting strong empirical evidence showing a stack latency advantage, second by delineating limits with the current interrupt-driven path, and third by proving that synchronous completion is indeed safe and correct. This paper further discusses challenges and opportunities introduced by synchronous I/O completion model for both operating system kernels and user applications.