Making a case for distributed file systems at Exascale

  • Authors:
  • Ioan Raicu;Ian T. Foster;Pete Beckman

  • Affiliations:
  • Illinois Institute of Technology, Chicago, IL, USA;Argonne National Laboratory, Argonne, IL, USA;Argonne National Laboratory, Argonne, IL, USA

  • Venue:
  • Proceedings of the third international workshop on Large-scale system and application performance
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Exascale computers will enable the unraveling of significant scientific mysteries. Predictions are that 2019 will be the year of exascale, with millions of compute nodes and billions of threads of execution. The current architecture of high-end computing systems is decades-old and has persisted as we scaled from gigascales to petascales. In this architecture, storage is completely segregated from the compute resources and are connected via a network interconnect. This approach will not scale several orders of magnitude in terms of concurrency and throughput, and will thus prevent the move from petascale to exascale. At exascale, basic functionality at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse for large-scale heroic applications. Storage has the potential to be the Achilles heel of exascale systems. We propose that future high-end computing systems be designed with non-volatile memory on every compute node, allowing every compute node to actively participate in the metadata and data management and leveraging many-core processors high bisection bandwidth in torus networks. This position paper discusses this revolutionary new distributed storage architecture that will make exascale computing more tractable, touching virtually all disciplines in high-end computing and fueling scientific discovery.