MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Executing a program on the MIT tagged-token dataflow architecture
Volume II: Parallel Languages on PARLE: Parallel Architectures and Languages Europe
Communications of the ACM
A preliminary architecture for a basic data-flow processor
25 years of the international symposia on Computer architecture (selected papers)
Concurrent control with “readers” and “writers”
Communications of the ACM
POPL '82 Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
First version of a data flow procedure language
Programming Symposium, Proceedings Colloque sur la Programmation
The incremental garbage collection of processes
Proceedings of the 1977 symposium on Artificial intelligence and programming languages
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications
ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
Preliminary design examination of the ParalleX system from a software and hardware perspective
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Blue matter: strong scaling of molecular dynamics on blue gene/l
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Application of the ParalleX execution model to stencil-based problems
Computer Science - Research and Development
Zero-Overhead Interfaces for High-Performance Computing Libraries and Kernels
SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Hi-index | 0.00 |
With the general availability of PetaFLOP clusters and the advent of heterogeneous machines equipped with special accelerator cards such as the Xeon Phi[2], computer scientist face the difficult task of improving application scalability beyond what is possible with conventional techniques and programming models today. In addition, the need for highly adaptive runtime algorithms and for applications handling highly inhomogeneous data further impedes our ability to efficiently write code which performs and scales well. In this paper we present the advantages of using HPX[19, 3, 29], a general purpose parallel runtime system for applications of any scale as a backend for LibGeoDecomp[25] for implementing a three-dimensional N-Body simulation with local interactions. We compare scaling and performance results for this application while using the HPX and MPI backends for LibGeoDecomp. LibGeoDecomp is a Library for Geometric Decomposition codes implementing the idea of a user supplied simulation model, where the library handles the spatial and temporal loops, and the data storage. The presented results are acquired from various homogeneous and heterogeneous runs including up to 1024 nodes (16384 conventional cores) combined with up to 16 Xeon Phi accelerators (3856 hardware threads) on TACC's Stampede supercomputer[1]. In the configuration using the HPX backend, more than 0.35 PFLOPS have been achieved, which corresponds to a parallel application efficiency of around 79%. Our measurements demonstrate the advantage of using the intrinsically asynchronous and message driven programming model exposed by HPX which enables better latency hiding, fine to medium grain parallelism, and constraint based synchronization. HPX's uniform programming model simplifies writing highly parallel code for heterogeneous resources.