The Amber system: parallel programming on a network of multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Managing pages in shared virtual memory systems: getting the compiler into the game
ICS '93 Proceedings of the 7th international conference on Supercomputing
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Thread migration and its applications in distributed shared memory systems
Journal of Systems and Software
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
SCI Monitoring Hardware and Software: Supporting Performance Evaluation and Debugging
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
SMiLE: An Integrated, Multi-Paradigm Software Infrastructure for SCI-Based Clusters
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
OS Support for Improving Data Locality on CC-NUMA Compute Servers
OS Support for Improving Data Locality on CC-NUMA Compute Servers
Hi-index | 0.00 |
Shared memory programs running on Non-Uniform Memory Access (NUMA) machines usually face inherent performance problems stemming from excessive remote memory accesses. A solution, called the Adaptive Runtime System (ARS), is presented in this paper. ARS is designed to adjust the data distribution at runtime through automatic page migrations. It uses memory access histograms gathered by hardware monitors to find access hot spots and, based on this detection, to dynamically and transparently modify the data layout. In this way, incorrectly allocated data can be moved to the most appropriate node and hence data locality can be improved. Simulations show that this allows to achieve a performance gain of as high as 40%.