Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers

Authors:
Thomas Heller;Hartmut Kaiser;Andreas Schäfer;Dietmar Fey
Affiliations:
Friedrich-Alexander-University, Erlangen, Germany;Louisiana State University, Louisiana;Friedrich-Alexander-University, Erlangen, Germany;Friedrich-Alexander-University, Erlangen, Germany
Venue:
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Year:
2013

Citing 18
Cited 0

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Executing a program on the MIT tagged-token dataflow architecture

Volume II: Parallel Languages on PARLE: Parallel Architectures and Languages Europe
Reevaluating Amdahl's law

Communications of the ACM
A preliminary architecture for a basic data-flow processor

25 years of the international symposia on Computer architecture (selected papers)
Concurrent control with “readers” and “writers”

Communications of the ACM
Messages as active agents

POPL '82 Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
First version of a data flow procedure language

Programming Symposium, Proceedings Colloque sur la Programmation
The incremental garbage collection of processes

Proceedings of the 1977 symposium on Artificial intelligence and programming languages
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
The Cilk++ concurrency platform

Proceedings of the 46th Annual Design Automation Conference
ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications

ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
Preliminary design examination of the ParalleX system from a software and hardware perspective

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Blue matter: strong scaling of molecular dynamics on blue gene/l

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Application of the ParalleX execution model to stencil-based problems

Computer Science - Research and Development
Zero-Overhead Interfaces for High-Performance Computing Libraries and Kernels

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the general availability of PetaFLOP clusters and the advent of heterogeneous machines equipped with special accelerator cards such as the Xeon Phi[2], computer scientist face the difficult task of improving application scalability beyond what is possible with conventional techniques and programming models today. In addition, the need for highly adaptive runtime algorithms and for applications handling highly inhomogeneous data further impedes our ability to efficiently write code which performs and scales well. In this paper we present the advantages of using HPX[19, 3, 29], a general purpose parallel runtime system for applications of any scale as a backend for LibGeoDecomp[25] for implementing a three-dimensional N-Body simulation with local interactions. We compare scaling and performance results for this application while using the HPX and MPI backends for LibGeoDecomp. LibGeoDecomp is a Library for Geometric Decomposition codes implementing the idea of a user supplied simulation model, where the library handles the spatial and temporal loops, and the data storage. The presented results are acquired from various homogeneous and heterogeneous runs including up to 1024 nodes (16384 conventional cores) combined with up to 16 Xeon Phi accelerators (3856 hardware threads) on TACC's Stampede supercomputer[1]. In the configuration using the HPX backend, more than 0.35 PFLOPS have been achieved, which corresponds to a parallel application efficiency of around 79%. Our measurements demonstrate the advantage of using the intrinsically asynchronous and message driven programming model exposed by HPX which enables better latency hiding, fine to medium grain parallelism, and constraint based synchronization. HPX's uniform programming model simplifies writing highly parallel code for heterogeneous resources.