Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
High performance Fortran language specification
ACM SIGPLAN Fortran Forum
Exploiting locality and tolerating remote memory access latency using thread migration
International Journal of Parallel Programming - Special issue: selected papers from PACT'96, fourth international conference on parallel architectures and compilation techniques—part 1
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Using Hardware Counters to Automatically Improve Memory Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hardware profile-guided automatic page placement for ccNUMA systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A transparent runtime data distribution engine for OpenMP
Scientific Programming
Scaling non-regular shared-memory codes by reusing custom loop schedules
Scientific Programming - OpenMP
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Enabling high-performance memory migration for multithreaded applications on LINUX
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
OpenMP and NUMA architectures I: Investigating memory placement on the SGI origin 3000
ICCS'03 Proceedings of the 2003 international conference on Computational science
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Affinity-on-next-touch: an extension to the Linux kernel for NUMA architectures
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Hi-index | 0.00 |
In modern NUMA architectures, preserving data access locality is a key issue to guarantee performance. We define, for the OpenMP programming model, a type of architecture-agnostic programmer hint to describe the behaviour of parallel loops. These hints are only related to features of the program, in particular to the data accessed by each loop iteration. The runtime will then combine this information with architectural information gathered during its initialization, to guide task scheduling, in case of dynamic loop iteration scheduling. We prove the effectiveness of the proposed technique on the NAS parallel benchmark suite, achieving an average speedup of 1.21x.