Optimizing a parallel runtime system for multicore clusters: a case study

  • Authors:
  • Chao Mei;Gengbin Zheng;Filippo Gioachin;Laxmikant V. Kalé

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Proceedings of the 2010 TeraGrid Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clusters of multicore nodes have become the most popular option for new HPC systems due to their scalability and performance/cost ratio. The complexity of programming multicore systems underscores the need for powerful and efficient runtime systems that manage resources such as threads and communication sub-systems on behalf of the applications. In this paper, we study several multicore performance issues on clusters using Intel, AMD and IBM processors in the context of the Charm++ runtime system. We then present the optimization techniques that overcome these performance issues. The techniques presented are general enough to apply to other runtime systems as well. We demonstrate the benefits of these optimizations through both synthetic benchmarks and production quality applications including NAMD and ChaNGa on several popular multicore platforms. We demonstrate performance improvement of NAMD and ChaNGa by about 20% and 10%, respectively.