Interactive locality optimization on NUMA architectures

Authors:
Tao Mu;Jie Tao;Martin Schulz;Sally A. McKee
Affiliations:
Technische Universität München;Technische Universität München;Cornell University;Cornell University
Venue:
Proceedings of the 2003 ACM symposium on Software visualization
Year:
2003

Citing 14
Cited 4

A cache coherence approach for large multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Design and implementation of the NUMAchine multiprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
The Paradyn Parallel Performance Measurement Tool

Computer
Visualizing the Performance of Parallel Programs

IEEE Software
IPS-2: The Second Generation of a Parallel Program Measurement System

IEEE Transactions on Parallel and Distributed Systems
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters

SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
Improving Data Locality Using Dynamic Page Migration Based on Memory Access Histograms

ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
A Simulation Tool for Evaluating Shared Memory Systems

ANSS '03 Proceedings of the 36th annual symposium on Simulation
An Approach to Immersive Performance Visualization of Parallel and Wide-Area Distributed Applications

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
SMiLE: An Integrated, Multi-Paradigm Software Infrastructure for SCI-Based Clusters

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid

Stacked-widget visualization of scheduling-based algorithms

Proceedings of the 4th ACM symposium on Software visualization
Core monitors: monitoring performance in multicore processors

Proceedings of the 6th ACM conference on Computing frontiers
Automatic data locality optimization through self-optimization

IWSOS'06/EuroNGI'06 Proceedings of the First international conference, and Proceedings of the Third international conference on New Trends in Network Architectures and Services conference on Self-Organising Systems
SIMT/OMP: a toolset to study and exploit memory locality of OpenMP applications on NUMA architectures

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimizing the performance of shared-memory NUMA programs remains something of a black art, requiring that application writers possess deep understanding of their programs' behaviors. This difficulty represents one of the remaining hindrances to the widespread adoption and deployment of these cost-efficient and scalable shared-memory NUMA architectures. To address this problem, we have developed a performance monitoring infrastructure and a corresponding set of tools to aid in visualizing and understanding the subtleties of the memory access behavior of parallel NUMA applications with large datasets. The tools are designed to be general, interoperable, and easily portable. We give detailed examples of the use of one particular tool in the set. We have used this memory access visualization tool profitably on a range of applications, improving performance by around 90%, on average.