Distributed Shared Memory: A Survey of Issues and Algorithms
Computer - Distributed computing systems: separate resources acting as one
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Design and implementation of the NUMAchine multiprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
ICS '99 Proceedings of the 13th international conference on Supercomputing
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Performance Evaluation of the Omni OpenMP Compiler
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Interactive locality optimization on NUMA architectures
Proceedings of the 2003 ACM symposium on Software visualization
A Simulation Tool for Evaluating Shared Memory Systems
ANSS '03 Proceedings of the 36th annual symposium on Simulation
ParADE: An OpenMP Programming Environment for SMP Cluster Systems
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Proceedings of the 2007 Summer Computer Simulation Conference
Hi-index | 0.00 |
OpenMP has become the dominant standard for shared memory programming. It is traditionally used for Symmetric Multiprocessor Systems, but has more recently also found its way to parallel architectures with distributed shared memory like NUMA machines. This combines the advantages of OpenMP's easy-to-use programming model with the scalability and cost-effectiveness of NUMA architectures. In NUMA (Non Uniform Memory Access) environments, however, OpenMP codes suffer from the longer latencies of remote memory accesses. This can be observed for both hardware and software DSM systems. In this paper we present SIMT/OMP, a simulation environment capable of modeling NUMA scenarios and providing comprehensive performance data about the inter-connection traffic. We use this tool to study the impact of NUMA on the performance of OpenMP applications and show how the memory layout of these codes can be improved using a visualization tool. Based on these techniques, we have achieved performance increases of up to a factor of five on some of our benchmarks, especially in larger system configurations.