Unbalanced tree search on a manycore system using the GPI programming model

Authors:
Rui Machado;Carsten Lojewski;Salvador Abreu;Franz-Josef Pfreundt
Affiliations:
Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany;Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany;University of Evora, Evora, Portugal;Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany
Venue:
Computer Science - Research and Development
Year:
2011

Citing 11
Cited 0

Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Design of dynamic load-balancing tools for parallel applications

Proceedings of the 14th international conference on Supercomputing
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
New challenges in dynamic load balancing

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
UTS: an unbalanced tree search benchmark

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores.