NUMACROS: data parallel programming on NUMA multiprocessors

Authors:
Hui Li;Kenneth C. Sevcik
Affiliations:
Computer Systems Research Institute, University of Toronto, Toronto, ON, Canada;Computer Systems Research Institute, University of Toronto, Toronto, ON, Canada
Venue:
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Year:
1993

Citing 13
Cited 1

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hector: A Hierarchically Structured Shared-Memory Multiprocessor

Computer - Special issue on experimental research in computer architecture
The DINO parallel programming language

Journal of Parallel and Distributed Computing
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Architecture-independent scientific programming in data parallel C: three case studies

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines

ICS '92 Proceedings of the 6th international conference on Supercomputing
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Portable Programs for Parallel Processors

Portable Programs for Parallel Processors
Data-Parallel Programming on Multicomputers

IEEE Software
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems

Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data parallel programming has been widely used in developing scientific applications on various types of parallel machines: SIMD, MIMD distributed memory machines, and UMA shared memory machines. On NUMA shared memory machines, data locality is the key to good performance of parallel applications. In this paper, we propose a set of macros (NUMACROS) for data parallel programming on NUMA machines. NUMACROS attempts to achieve both ease of programming and data locality for performance. Programs written using NUMACROS are nearly as short and easily readable as sequential versions of the programs. To obtain data locality, data and loops are distributed and partitioned in a coordinated fashion among the processors. Although global address spaces facilitate data distribution on NUMA systems, a naive implementation of an application will suffer from high costs. To reduce the cost, a number of approaches have been proposed and evaluated. These include index precomputing, index checking, loop transformation, and others. Our experimental results, with the Hector multiprocessor, show that these approaches are effective. While such facilities will be provided by compilers in the long run, NUMACROS is a helpful interim step.