OpenMP tasking analysis for programmers
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
The Journal of Supercomputing
Online mapping of MPI-2 dynamic tasks to processes and threads
International Journal of High Performance Systems Architecture
Computer Methods and Programs in Biomedicine
Overlapping communication with computation using OpenMP tasks on the GTS magnetic fusion code
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Scheduling task parallelism on multi-socket multicore systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Commutative set: a language extension for implicit parallel programming
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
Proceedings of the International Conference on Computer-Aided Design
BDDT:: block-level dynamic dependence analysisfor deterministic task-based parallelism
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Quasi-parallel network applications in real-time distribution management system
International Journal of Innovative Computing and Applications
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
An extension to improve OpenMP tasking control
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Using dynamic task level redundancy for OpenMP fault tolerance
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
OpenMP task scheduling strategies for multicore NUMA systems
International Journal of High Performance Computing Applications
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces
Proceedings of the 2012 international symposium on Memory Management
Data-driven fault tolerance for work stealing computations
Proceedings of the 26th ACM international conference on Supercomputing
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
Proceedings of the 26th ACM international conference on Supercomputing
Introducing task cancellation to OpenMP
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Assessing OpenMP tasking implementations on NUMA architectures
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Application of service oriented architecture to finite element analysis
Advances in Engineering Software
Eigenvalue computations in the context of data-sparse approximations of integral operators
Journal of Computational and Applied Mathematics
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Task-parallel programming on NUMA architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Steal Tree: low-overhead tracing of work stealing schedulers
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A new approach for performance analysis of openMP programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Variation-tolerant OpenMP tasking on tightly-coupled processor clusters
Proceedings of the Conference on Design, Automation and Test in Europe
Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters
Proceedings of the Conference on Design, Automation and Test in Europe
Fence-free work stealing on bounded TSO processors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
OmpSs@Zynq all-programmable SoC ecosystem
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
OpenMP has been very successful in exploiting structured parallelism in applications. With increasing application complexity, there is a growing need for addressing irregular parallelism in the presence of complicated control structures. This is evident in various efforts by the industry and research communities to provide a solution to this challenging problem. One of the primary goals of OpenMP 3.0 is to define a standard dialect to express and efficiently exploit unstructured parallelism. This paper presents the design of the OpenMP tasking model by members of the OpenMP 3.0 tasking sub-committee which was formed for this purpose. The paper summarizes the efforts of the sub-committee (spanning over two years) in designing, evaluating and seamlessly integrating the tasking model into the OpenMP specification. In this paper, we present the design goals and key features of the tasking model, including a rich set of examples and an in-depth discussion of the rationale behind various design choices. We compare a prototype implementation of the tasking model with existing models, and evaluate it on a wide range of applications. The comparison shows that the OpenMP tasking model provides expressiveness, flexibility, and huge potential for performance and scalability.