Fast Address Translation Techniques for Distributed Shared Memory Compilers

Authors:
Francois Cantonnet;Tarek A. El-Ghazawi;Pascal Lorenz;Jaafer Gaber
Affiliations:
The George Washington University, USA;The George Washington University, USA;Université de Haute Alsace, France;Université de Technologie de Belfort-Montbéliard, France
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Year:
2005

Citing 5
Cited 4

Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Distributed data access in AC

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
UPC Benchmarking Issues

ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
UPC performance and potential: a NPB experimental study

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing

Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
A PGAS-Based Algorithm for the Longest Common Subsequence Problem

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Toward an application support layer: numerical computation in unified parallel c

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Towards a complexity model for design and analysis of PGAS-based algorithms

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Distributed Shared Memory (DSM) model is designed to leverage the ease ofprogramming of the shared memory paradigm, while enabling the highperformance by expressing locality as in the messagepassing model. Experience, however, has shown that DSM programming languages, such as UPC, may be unable to deliver the expected high level of performance. Initial investigations have shown that among the major reasons is the overhead of translating from the UPC memory model to the target architecture virtualaddresses space, which can be very costly. Experimental measurements have shown this overhead increasing execution time by up to three orders of magnitude. Previous work has also shown that some of this overhead can be avoided by hand-tuning, which on the other hand can significantly decrease the UPC ease of use. In addition, such tuning can only improve the performance of local shared accesses but not remote shared accesses. Therefore, a new technique that resembles the Translation Look Aside Buffers (TLBs) is proposed here. This technique, which is called the Memory Model Translation Buffer (MMTB) has been implemented in the GCC-UPC compiler using two alternative strategies, full-table (FT) and reduced-table (RT). It will be shown that the MMTB strategies can lead to a performance boost of up to 700%, enabling ease-of-programming while performing at a similar performance to hand-tuned UPC and MPI codes.