A hybrid strategy based on data distribution and migration for optimizing memory locality

Authors:
I. Kadayif;M. Kandemir;A. Choudhary
Affiliations:
Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL
Venue:
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Year:
2002

Citing 12
Cited 0

Page placement policies for NUMA multiprocessors

Journal of Parallel and Distributed Computing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines

Compiling for numa parallel machines
Scheduling and page migration for multiprocessor compute servers

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data distribution support on distributed shared memory multiprocessors

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of a NUMA architecture depends on the efficient use of local memory. Therefore, software-level techniques that improve memory locality (in addition to parallelism) are extremely important to extract the best performance from these architectures. The proposed solutions so far include OS-based automatic data migrations and compiler-based static/dynamic data distributions. This paper proposes and evaluates a hybrid strategy for optimizing memory locality in NUMA architectures. In this strategy, we employ both compiler-directed data distribution and OS-directed dynamic page migration. More specifically, a given program code is first divided into segments, and then each segment is optimized either using compiler-based data distributions (at compile-time) or using dynamic migration (at runtime). In selecting the optimization strategy to use for a program segment, we use a criterion based on the number of compile-time analyzable references in loops. To test the effectiveness of our strategy in optimizing memory locality of applications, we implemented it and compared its performance with that of several other techniques such as compiler-directed data distribution and OS-directed dynamic page migration. Our experimental results obtained through simulation indicate that our hybrid strategy outperforms other strategies and achieves the best performance for a set of codes with regular, irregular, and mixed (regular + irregular) access patterns.