Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks

Authors:
L. N. Bhuyan;H. Wang;R. Iyer
Affiliations:
Texas A & M Univ., College Station;Texas A & M Univ., College Station;Intel Corp., Santa Clara, CA
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2000

Citing 14
Cited 6

Performance of Multiprocessor Interconnection Networks

Computer
Dynamic Page Migration in Multiprocessors with Distributed Global Memory

IEEE Transactions on Computers
Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons

ACM Computing Surveys (CSUR)
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Evaluating virtual channels for cache-coherent shared-memory multiprocessors

ICS '96 Proceedings of the 10th international conference on Supercomputing
The Performance of the Cedar Multistage Switching Network

IEEE Transactions on Parallel and Distributed Systems
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems
Performance benefits of virtual channels and adaptive routing: an application-driven study

ICS '97 Proceedings of the 11th international conference on Supercomputing
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

Is data distribution necessary in OpenMP?

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Design and analysis of static memory management policies for CC-NUMA Multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
An Efficient Technique for Corner-Turn in SAR Image Reconstruction by Improving Cache Access

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Simulation Tool for Evaluating Shared Memory Systems

ANSS '03 Proceedings of the 36th annual symposium on Simulation
A transparent runtime data distribution engine for OpenMP

Scientific Programming
Dual-layered file cache on cc-NUMA system

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the impact of memory management policies and switch design alternatives on the application performance of cache-coherent nonuniform memory access (CC-NUMA) multiprocessors is studied in detail. Memory management plays an important role in determining the performance of NUMA multiprocessors by dictating the placement of data among the distributed memory modules. We analyze memory traces of several scientific applications for three different memory management techniques, namely buddy, round-robin, and first-touch policies, and compare their memory system performance. Interconnection network switch designs that consider virtual channels and varying number of input buffers per switch are presented. Our performance evaluation is based on execution-driven simulation methodology to capture the dynamic changes in the network traffic during execution of the applications. It is shown that the use of cut-through switching with buffers and virtual channels can improve the average message latency tremendously. However, the choice of memory management policy affects the amount of network traffic and the network access pattern. Thus, we vary the memory management policy and confirm the performance benefits of improved switch designs. Results of sensitivity studies by varying switch design parameters, cache block size, and memory page size are also presented. We find that a combination of first-touch memory management policy and a switch design with virtual channels and increased buffer space can reduce the average message latency by as high as 70 percent.