A modified tree code: don't laugh; it runs
Journal of Computational Physics
Astrophysical N-body simulations using hierarchical tree data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Highly portable and efficient implementations of parallel adaptive N-body methods
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
RECONFIG '09 Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs
Task Superscalar: An Out-of-Order Task Pipeline
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
Many HPC algorithms are highly irregular. They have input-dependent control flow and operate on pointer-based data structures such as trees, graphs, or linked lists. This irregularity makes it challenging to parallelize such algorithms in order to efficiently run them on modern HPC systems. In this paper we study the architectural and programming bottlenecks of the OmpSs task-based programming model when implementing irregular applications. We select a sequential N-body simulation code and describe its parallelization using OmpSs. We then analyze the code, focusing on scalability and load balancing. We conclude that, in general, task-based programming models are well suited to the exploitation of irregular parallelism. Nevertheless, in order to avoid the overheads associated with manually managing the load balancing, the hardware and runtime will need to collectively support much finer-grained tasks.