Page placement policies for NUMA multiprocessors
Journal of Parallel and Distributed Computing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Hi-index | 0.00 |
The performance of a NUMA architecture depends on the efficient use of local memory. Therefore, software-level techniques that improve memory locality (in addition to parallelism) are extremely important to extract the best performance from these architectures. The proposed solutions so far include OS-based automatic data migrations and compiler-based static/dynamic data distributions. This paper proposes and evaluates a hybrid strategy for optimizing memory locality in NUMA architectures. In this strategy, we employ both compiler-directed data distribution and OS-directed dynamic page migration. More specifically, a given program code is first divided into segments, and then each segment is optimized either using compiler-based data distributions (at compile-time) or using dynamic migration (at runtime). In selecting the optimization strategy to use for a program segment, we use a criterion based on the number of compile-time analyzable references in loops. To test the effectiveness of our strategy in optimizing memory locality of applications, we implemented it and compared its performance with that of several other techniques such as compiler-directed data distribution and OS-directed dynamic page migration. Our experimental results obtained through simulation indicate that our hybrid strategy outperforms other strategies and achieves the best performance for a set of codes with regular, irregular, and mixed (regular + irregular) access patterns.