Runtime I/O re-routing + throttling on HPC storage

Authors:
Qing Liu;Norbert Podhorszki;Jeremy Logan;Scott Klasky
Affiliations:
Computer Science and Mathematics Division, Oak Ridge National Laboratory;Computer Science and Mathematics Division, Oak Ridge National Laboratory;Computer Science and Mathematics Division, Oak Ridge National Laboratory;Computer Science and Mathematics Division, Oak Ridge National Laboratory
Venue:
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Year:
2013

Citing 7
Cited 1

The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
pClock: an arrival curve based approach for QoS guarantees in shared storage systems

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
QoS support for end users of I/O-intensive applications using shared storage systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Unitary qubit lattice simulations of multiscale phenomena in quantum turbulence

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Gecko: a contention-oblivious design for cloud storage

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems

Active workflow system for near real-time extreme-scale science

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Massively parallel storage systems are becoming more and more prevalent on HPC systems due to the emergence of a new generation of data-intensive applications. To achieve the level of I/O throughput and capacity that is demanded by data intensive applications, storage systems typically deploy a large number of storage devices (also known as LUNs or data stores). In doing so, parallel applications are allowed to access storage concurrently, and as a result, the aggregate I/O throughput can be linearly increased with the number of storage devices, reducing the application's end-to-end time. For a production system where storage devices are shared between multiple applications, contention is often a major problem leading to a significant reduction in I/O throughput. In this paper, we describe our efforts to resolve this issue in the context of HPC using a balanced re-routing + throttling approach. The proposed scheme re-routes I/O requests to a less congested storage location in a controlled manner so that write performance is improved while limiting the impact on read.