Accounting for memory bank contention and delay in high-bandwidth multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Monitoring shared virtual memory performance on a Myrinet-based PC cluster
ICS '98 Proceedings of the 12th international conference on Supercomputing
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Contention elimination by replication of sequential sections in distributed shared memory programs
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
The Efeect of Contention on the Scalability of Page-Based Software Shared Memory Systems
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
Hi-index | 0.00 |
This paper makes the following contributions: It proposes a new methodology for quantifying remote memory access contention on hardware DSM multiprocessors. The most valuable aspect of this methodology is that it assesses the impact of contention on real parallel programs running on real hardware. The methodology uses as input the number of accesses from each DSM node to each page in memory. A trace of the memory accesses of the program obtained at runtime from hardware counters is used to compute an accurate estimate of the fraction of execution time wasted due to contention. The paper presents also a new algorithm which detects potential hot spots in pages and resolves contention on them using dynamic page migration. The algorithm balances the remote memory accesses across the nodes of the system, while trying to improve memory access locality. Experiments with five parallel codes with irregular memory access patterns on a 128- processor Origin2000 show that our algorithm yields respectable reductions of execution time, averaging 27.7%.