Exploring thread and memory placement on NUMA architectures: solaris and linux, UltraSPARC/FirePlane and opteron/hypertransport

Authors:
Joseph Antony;Pete P. Janes;Alistair P. Rendell
Affiliations:
Department of Computer Science, Australian National University, Canberra, Australia;Department of Computer Science, Australian National University, Canberra, Australia;Department of Computer Science, Australian National University, Canberra, Australia
Venue:
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Year:
2006

Citing 12
Cited 5

Programming with POSIX threads

Programming with POSIX threads
Parallel programming in OpenMP

Parallel programming in OpenMP
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
The sun fireplane system interconnect

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
Using Hardware Counters to Automatically Improve Memory Performance

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
On the importance of parallel application placement in NUMA multiprocessors

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)
OpenMP and NUMA architectures I: Investigating memory placement on the SGI origin 3000

ICCS'03 Proceedings of the 2003 international conference on Computational science

Managing Multicore with OpenMP (Extended Abstract)

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Impact of NUMA effects on high-speed networking with multi-opteron machines

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Improving memory affinity of geophysics applications on NUMA platforms using minas

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Nonuniform memory affinity strategy in multithreaded sparse matrix computations

Proceedings of the 2012 Symposium on High Performance Computing
Memory-centric system interconnect design with hybrid memory cubes

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern shared memory multiprocessor systems commonly have non-uniform memory access (NUMA) with asymmetric memory bandwidth and latency characteristics. Operating systems now provide application programmer interfaces allowing the user to perform specific thread and memory placement. To date, however, there have been relatively few detailed assessments of the importance of memory/thread placement for complex applications. This paper outlines a framework for performing memory and thread placement experiments on Solaris and Linux. Thread binding and location specific memory allocation and its verification is discussed and contrasted. Using the framework, the performance characteristics of serial versions of lmbench, Stream and various BLAS libraries (ATLAS, GOTO, ACML on Opteron/Linux and Sunperf on Opteron, UltraSPARC/Solaris) are measured on two different hardware platforms (UltraSPARC/FirePlane and Opteron/HyperTransport). A simple model describing performance as a function of memory distribution is proposed and assessed for both the Opteron and UltraSPARC.