Design and analysis of static memory management policies for CC-NUMA Multiprocessors

Authors:
Ravishankar Iyer;Hujun Wang;Laxmi Narayan Bhuyan
Affiliations:
Intel Corporation, 15220 N.W. Greenbrier Parkway, Beaverton, OR;Intel Corporation, 15220 N.W. Greenbrier Parkway, Beaverton, OR and Department of Computer Science, Texas A&M University, College Station, TX;Intel Corporation, 15220 N.W. Greenbrier Parkway, Beaverton, OR and Department of Computer Science, University of California, Riverside, CA
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2002

Citing 23
Cited 1

Performance of Multiprocessor Interconnection Networks

Computer
Dynamic Page Migration in Multiprocessors with Distributed Global Memory

IEEE Transactions on Computers
NUMA policies and their relation to memory architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Introducing a New Cache Design into Vector Computers

IEEE Transactions on Computers
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Evaluating virtual channels for cache-coherent shared-memory multiprocessors

ICS '96 Proceedings of the 10th international conference on Supercomputing
The Performance of the Cedar Multistage Switching Network

IEEE Transactions on Parallel and Distributed Systems
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
A Comparative Analysis of Cache Designs for Vector Processing

IEEE Transactions on Computers
Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity

ICS '99 Proceedings of the 13th international conference on Supercomputing
Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks

IEEE Transactions on Parallel and Distributed Systems
An Analytical Model of Adaptive Wormhole Routing in Hypercubes in the Presence of Hot Spot Traffic

IEEE Transactions on Parallel and Distributed Systems
Analytical Modeling of Wormhole-Routed k-Ary n-Cubes in the Presence of Hot-Spot Traffic

IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
Multiskewing-A Novel Technique for Optimal Parallel Memory Access

IEEE Transactions on Parallel and Distributed Systems
Towards a Communication Characterization Methodology for Parallel Applications

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

Nonuniform memory affinity strategy in multithreaded sparse matrix computations

Proceedings of the 2012 Symposium on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we characterize the performance of three existing memory management techniques, namely, buddy, round-robin, and first-touch policies. With existing memory management schemes, we find several cases where requests from different processors arrive at the same memory simultaneously. To alleviate this problem, we present two improved memory management policies called skew-mapping and prime-mapping policies. By utilizing the properties of skewing and prime, the improved memory management designs considerably improve the application performance of cache coherent non-uniform memory access multiprocessors. We also re-evaluate the performance of a multistage interconnection network using these existing and improved memory management policies. Our results effectively present the performance benefits of different memory management techniques based on the sharing patterns of applications. Applications with a low degree of sharing benefit from the data locality provided by first-touch. However, several applications with significant sharing degrees as well as those with single processor initialization routines benefit highly from the intelligent distribution of data provided by skew-mapping and prime-mapping schemes. Improvements due to the new schemes are found to be as high as 35% in stall time.