Runtime I/O re-routing + throttling on HPC storage

  • Authors:
  • Qing Liu;Norbert Podhorszki;Jeremy Logan;Scott Klasky

  • Affiliations:
  • Computer Science and Mathematics Division, Oak Ridge National Laboratory;Computer Science and Mathematics Division, Oak Ridge National Laboratory;Computer Science and Mathematics Division, Oak Ridge National Laboratory;Computer Science and Mathematics Division, Oak Ridge National Laboratory

  • Venue:
  • HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Massively parallel storage systems are becoming more and more prevalent on HPC systems due to the emergence of a new generation of data-intensive applications. To achieve the level of I/O throughput and capacity that is demanded by data intensive applications, storage systems typically deploy a large number of storage devices (also known as LUNs or data stores). In doing so, parallel applications are allowed to access storage concurrently, and as a result, the aggregate I/O throughput can be linearly increased with the number of storage devices, reducing the application's end-to-end time. For a production system where storage devices are shared between multiple applications, contention is often a major problem leading to a significant reduction in I/O throughput. In this paper, we describe our efforts to resolve this issue in the context of HPC using a balanced re-routing + throttling approach. The proposed scheme re-routes I/O requests to a less congested storage location in a controlled manner so that write performance is improved while limiting the impact on read.