Run-time parallelization and scheduling of loops
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
High-level management of communication schedules in HPF-like languages
ICS '98 Proceedings of the 12th international conference on Supercomputing
Thread migration and its applications in distributed shared memory systems
Journal of Systems and Software
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallelization of a dynamic unstructured application using three leading paradigms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Is data distribution necessary in OpenMP?
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Producing scalable performance with OpenMP: experiments with two CFD applications
Parallel Computing - Special issue on parallel computing in aerospace
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
An Adaptive Approach to Data Placement
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
OpenMP versus MPI for PDE Solvers Based on Regular Sparse Numerical Operators
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Dual-level parallelism for deterministic and stochastic CFD problems
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
OpenMP versus MPI for PDE solvers based on regular sparse numerical operators
Future Generation Computer Systems
Scaling non-regular shared-memory codes by reusing custom loop schedules
Scientific Programming - OpenMP
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
Irregularity handling via structured parallel programming
International Journal of Computational Science and Engineering
OpenMP versus MPI for PDE solvers based on regular sparse numerical operators
Future Generation Computer Systems
Improving the performance of OpenMP by array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A pattern language for parallelizing irregular algorithms
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Implementing irregular parallel algorithms with OpenMP
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
The long foreseen goal of parallel programming models is to scale parallel code without significant programming effort. Irregular parallel applications are a particularly challenging application domain for parallel programming models, since they require domain specific data distribution and load balancing algorithms. From a performance perspective, shared-memory models still fall short of scaling as well as message-passing models in irregular applications, although they require less coding effort. We present a simple runtime methodology for scaling irregular applications parallelized with the standard OpenMP interface. We claim that our parallelization methodology requires the minimum amount of effort from the programmer and prove experimentally that it is able to scale two highly irregular codes as well as MPI, with an order of magnitude less programming effort. This is probably the first time such a result is obtained from OpenMP, more so, by keeping the OpenMP API intact.