The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
DiP: A Parallel Program Development Environment
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
OpenMP tasks in IBM XL compilers
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CellSs: Scheduling techniques to better exploit memory hierarchy
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Evaluation of memory performance on the cell BE with the SARC programming model
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Identifying Inter-task Communication in Shared Memory Programming Models
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Achieving high memory performance from heterogeneous architectures with the SARC programming model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
OpenMP extensions for FPGA accelerators
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Automatic calibration of performance models on heterogeneous multicore architectures
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Using a "codelet" program execution model for exascale machines: position paper
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Parallelising computational microstructure simulations for metallic materials with OpenMP
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Transactional access to shared memory in starss, a task based programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
Tasking in OpenMP 3.0 has been conceived to handle the dynamicgeneration of unstructured parallelism. New directives have beenadded allowing the user to identify units of independent work (tasks) andto define points to wait for the completion of tasks (task barriers). Inthis paper we propose an extension to allow the runtime detection of dependenciesbetween generated tasks, broading the range of applicationsthat can benefit from tasking or improving the performance when loadbalancing or locality are critical issues for performance. Furthermore thepaper describes our proof-of-concept implementation (SMP Superscalar)and shows preliminary performance results on an SGI Altix 4700.