Unbalanced tree search on a manycore system using the GPI programming model

  • Authors:
  • Rui Machado;Carsten Lojewski;Salvador Abreu;Franz-Josef Pfreundt

  • Affiliations:
  • Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany;Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany;University of Evora, Evora, Portugal;Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany

  • Venue:
  • Computer Science - Research and Development
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores.