Memory-conscious collective I/O for extreme scale HPC systems

  • Authors:
  • Yin Lu;Yong Chen;Yu Zhuang;Rajeev Thakur

  • Affiliations:
  • Texas Tech University Lubbock, TX;Texas Tech University Lubbock, TX;Texas Tech University Lubbock, TX;Argonne National Laboratory Argonne, IL

  • Venue:
  • Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Upcoming extreme scale platforms are expected to have millions of nodes with hundreds to thousands of small cores for each node. The continuing decrease in memory capacity per core and the increasing disparity between core count and off-chip memory bandwidth can lead to significant challenges for I/O operations in extreme scale systems. Collective I/O is a critical I/O optimization technique, and the extreme scale challenges require rethinking this strategy for the effective exploitation of the correlation among I/O accesses. In this study, considering the constraint of the memory capacity and bandwidth, we introduce a Memory-Conscious Collective I/O. The new collective I/O strategy restricts aggregation data traffic within disjointed subgroups, coordinates I/O accesses in intra-node and inter-node layer, and determines I/O aggregators at run time considering memory consumption and variance among processes. The preliminary results have demonstrated that this strategy holds promise in mitigating the memory pressure, alleviating the contention for memory bandwidth, and improving the I/O performance for projected extreme scale HPC systems.