Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
From single core to multi-core: preparing for a new exponential
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Disk schedulers for solid state drivers
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Better I/O through byte-addressable, persistent memory
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
FlexSC: flexible system call scheduling with exception-less system calls
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Providing safe, user space access to fast, solid state disks
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
The bleak future of NAND flash memory
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
When poll is better than interrupt
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
FIOS: a fair, efficient flash I/O scheduler
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server
Proceedings of the Third ACM Symposium on Cloud Computing
Hi-index | 0.00 |
The IO performance of storage devices has accelerated from hundreds of IOPS five years ago, to hundreds of thousands of IOPS today, and tens of millions of IOPS projected in five years. This sharp evolution is primarily due to the introduction of NAND-flash devices and their data parallel design. In this work, we demonstrate that the block layer within the operating system, originally designed to handle thousands of IOPS, has become a bottleneck to overall storage system performance, specially on the high NUMA-factor processors systems that are becoming commonplace. We describe the design of a next generation block layer that is capable of handling tens of millions of IOPS on a multi-core system equipped with a single storage device. Our experiments show that our design scales graciously with the number of cores, even on NUMA systems with multiple sockets.