Memory access behavior analysis of NUMA-based shared memory programs

Authors:
Jie Tao;Wolfgang Karl;Martin Schulz
Affiliations:
LRR-TUM, Institut fü/r Informatik, Technische Universitä/t Mü/nchen, Germany. Tel: +49-89-289-{28397,28278,28399}/ E-mail: tao@in.tum.de (Staff member of Jilin Univ., China, pursuing a ...;LRR-TUM, Institut fü/r Informatik, Technische Universitä/t Mü/nchen, 80290 Mü/nchen, Germany. Tel: +49-89-289-{28397,28278,28399}/ E-mail: {tao,karlw,schulzm}@in.tum.de;LRR-TUM, Institut fü/r Informatik, Technische Universitä/t Mü/nchen, 80290 Mü/nchen, Germany. Tel: +49-89-289-{28397,28278,28399}/ E-mail: {tao,karlw,schulzm}@in.tum.de
Venue:
Scientific Programming
Year:
2002

Citing 11
Cited 1

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters

SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Supporting Shared Memory and Message Passing on Clusters of PCs with a SMiLE

CANPC '99 Proceedings of the Third International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
True Shared Memory Programming on SCI-Based Clusters

SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
Optimizing Data Locality for SCI-Based PC-Clusters with the SMiLE Monitoring Approach

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
OS Support for Improving Data Locality on CC-NUMA Compute Servers

OS Support for Improving Data Locality on CC-NUMA Compute Servers

A visual environment for specifying global reduction operations

International Journal of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Shared memory applications running transparently on top of NUMA architectures often face severe performance problems due to bad data locality and excessive remote memory accesses. Optimizations with respect to data locality are therefore necessary, but require a fundamental understanding of an application's memory access behavior. The information necessary for this cannot be obtained using simple code instrumentation due to the implicit nature of the communication handled by the NUMA hardware, the large amount of traffic produced at runtime, and the fine access granularity in shared memory codes. In this paper an approach to overcome these problems and thereby to enable an easy and efficient optimization process is presented. Based on a low-level hardware monitoring facility in coordination with a comprehensive visualization tool, it enables the generation of memory access histograms capable of showing all memory accesses across the complete address space of an application's working set. This information can be used to identify access hot spots, to understand the dynamic behavior of shared memory applications, and to optimize applications using an application specific data layout resulting in significant performance improvements.