A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
An Automatic Iteration/Data Distribution Method Based on Access Descriptors for DSMM
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
A proposal for a new hardware cache monitoring architecture
Proceedings of the 2002 workshop on Memory system performance
Efficient memory simulation in SimICS
SS '95 Proceedings of the 28th Annual Simulation Symposium
OS Support for Improving Data Locality on CC-NUMA Compute Servers
OS Support for Improving Data Locality on CC-NUMA Compute Servers
Design and Analysis of Static Memory Management Policies for CC-NUMA Multiprocessors
Design and Analysis of Static Memory Management Policies for CC-NUMA Multiprocessors
Using complete machine simulation to understand computer system behavior
Using complete machine simulation to understand computer system behavior
Hi-index | 0.00 |
Due to the inherent non-uniformity in the memory system, programmers and users of non-uniform memory access (NUMA) machines have to take special care of the memory performance of their applications. This paper discusses a variety of potential improvements with respect to cache misses, cache invalidations, and inter-node communication. This study is based on the simulation tool SIMT, which models the memory hierarchy in detail and is capable of providing complete, accurate information about all dynamic memory references. This information can be used to analyze the memory access behavior of applications and thereby forms the basis for any optimization with respect to memory accesses.