Optimizing OpenMP programs on software distributed shared memory systems

Authors:
Seung-Jai Min;Ayon Basumallik;Rudolf Eigenmann
Affiliations:
School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana;School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana;School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana
Venue:
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Year:
2003

Citing 10
Cited 13

Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Compiler-directed data prefetching in multiprocessors with memory hierarchies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors

The Journal of Supercomputing
Towards OpenMP Execution on Software Distributed Shared Memory Systems

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Compiling for a hybrid programming model using the LMAD representation

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Combined compile-time and runtime-driven, pro-active data movement in software DSM systems

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Towards automatic translation of OpenMP to MPI

Proceedings of the 19th annual international conference on Supercomputing
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing irregular shared-memory applications for clusters

Proceedings of the 22nd annual international conference on Supercomputing
Overcoming performance bottlenecks in using OpenMP on SMP clusters

Parallel Computing
OpenMP Extensions for Irregular Parallel Applications on Clusters

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving performance of OpenMP for SMP clusters through overlapped page migrations

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Experiences in using cetus for source-to-source transformations

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
The SCore cluster enabled OpenMP environment: performance prospects for computational science

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Barrier elimination based on access dependency analysis for OpenMP

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks.