U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
An implementation of the Hamlyn sender-managed interface architecture
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Improving I/O performance with a conditional store buffer
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A cost-effective, high-bandwidth storage architecture
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Experiences with VI communication for database storage
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Protected, user-level DMA for the SHRIMP network interface
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A case for virtual channel processors
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Design Trade-Offs for User-Level I/O Architectures
IEEE Transactions on Computers
Optimizing network virtualization in Xen
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
High performance VMM-bypass I/O in virtual machines
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
High performance and scalable I/O virtualization via self-virtualized devices
Proceedings of the 16th international symposium on High performance distributed computing
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Concurrent Direct Network Access for Virtual Machine Monitors
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization
Proceedings of the 23rd international conference on Supercomputing
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
Investigating virtual passthrough I/O on commodity devices
ACM SIGOPS Operating Systems Review
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable I/O - a well-architected way to do scalable, secure and virtualized I/O
WIOV'08 Proceedings of the First conference on I/O virtualization
GPU virtualization on VMware's hosted I/O architecture
WIOV'08 Proceedings of the First conference on I/O virtualization
Standardized but flexible I/O for self-virtualizing devices
WIOV'08 Proceedings of the First conference on I/O virtualization
Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Click Trajectories: End-to-End Analysis of the Spam Value Chain
SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
Exploiting peak device throughput from random access workload
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Memorage: emerging persistent RAM based malleable main memory and storage architecture
Proceedings of the 27th international ACM conference on International conference on supercomputing
Linux block IO: introducing multi-queue SSD access on multi-core systems
Proceedings of the 6th International Systems and Storage Conference
QuickSAN: a storage area network for fast, distributed, solid state disks
Proceedings of the 40th Annual International Symposium on Computer Architecture
Unified high-performance I/O: one stack to rule them all
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Storage-class memory needs flexible interfaces
Proceedings of the 4th Asia-Pacific Workshop on Systems
The nonkernel: a kernel designed for the cloud
Proceedings of the 4th Asia-Pacific Workshop on Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
From ARIES to MARS: transaction support for next-generation, solid-state drives
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Bankshot: caching slow storage in fast non-volatile memory
Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
Kiln: closing the performance gap between systems with and without persistence support
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
NVM duet: unified working memory and persistent store architecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Strata: scalable high-performance storage on virtualized non-volatile memory
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
On the energy overhead of mobile storage systems
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
DC express: shortest latency protocol for reading phase change memory over PCI express
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Emerging fast, non-volatile memories (e.g., phase change memories, spin-torque MRAMs, and the memristor) reduce storage access latencies by an order of magnitude compared to state-of-the-art flash-based SSDs. This improved performance means that software overheads that had little impact on the performance of flash-based systems can present serious bottlenecks in systems that incorporate these new technologies. We describe a novel storage hardware and software architecture that nearly eliminates two sources of this overhead: Entering the kernel and performing file system permission checks. The new architecture provides a private, virtualized interface for each process and moves file system protection checks into hardware. As a result, applications can access file data without operating system intervention, eliminating OS and file system costs entirely for most accesses. We describe the support the system provides for fast permission checks in hardware, our approach to notifying applications when requests complete, and the small, easily portable changes required in the file system to support the new access model. Existing applications require no modification to use the new interface. We evaluate the performance of the system using a suite of microbenchmarks and database workloads and show that the new interface improves latency and bandwidth for 4 KB writes by 60% and 7.2x, respectively, OLTP database transaction throughput by up to 2.0x, and Berkeley-DB throughput by up to 5.7x. A streamlined asynchronous file IO interface built to fully utilize the new interface enables an additional 5.5x increase in throughput with 1 thread and 2.8x increase in efficiency for 512 B transfers.