Identifying Inter-task Communication in Shared Memory Programming Models

Authors:
Per Larsen;Sven Karlsson;Jan Madsen
Affiliations:
DTU Informatics, Technical University of Denmark,;DTU Informatics, Technical University of Denmark,;DTU Informatics, Technical University of Denmark,
Venue:
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Year:
2009

Citing 22
Cited 0

Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Context-sensitive interprocedural points-to analysis in the presence of function pointers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The undecidability of aliasing

ACM Transactions on Programming Languages and Systems (TOPLAS)
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Concurrent control with “readers” and “writers”

Communications of the ACM
Pointer analysis: haven't we solved this problem yet?

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
A microbenchmark suite for OpenMP 2.0

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Shared Memory Consistency Models: A Tutorial

Computer
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2

CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Programming Effort vs. Performance with a Hybrid Programming Model for Distributed Memory Parallel Architectures

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Control flow analysis

Proceedings of a symposium on Compiler optimization
Task Graph Extraction for Embedded System Synthesis

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
System-Level Design Techniques for Energy-Efficient Embedded Systems

System-Level Design Techniques for Energy-Efficient Embedded Systems
Automatic run-time extraction of communication graphs from multithreaded applications

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)

Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)
A Proposal for Task Parallelism in OpenMP

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Type systems for distributed data sharing

SAS'03 Proceedings of the 10th international conference on Static analysis
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern computers often use multi-core architectures, covering clusters of homogeneous cores for high performance computing, to heterogeneous architectures typically found in embedded systems. To efficiently program such architectures, it is important to be able to partition and map programs onto the cores of the architecture. We believe that communication patterns need to become explicit in the source code to make it easier to analyze and partition parallel programs. Extraction of these patterns are difficult to automate due to limitations in compiler techniques when determining the effects of pointers. In this paper, we propose an OpenMP extension which allows programmers to explicitly declare the pointer based data-sharing between coarse-grain program parts. We present a dependency directive, expressing the input and output relation between program parts and pointers to shared data, as well as a set of runtime operations which are necessary to enforce declarations made by the programmer. The cost and scalability of the runtime operations are evaluated using micro-benchmarks and a benchmark from the NAS parallel benchmark suite. The measurements show that the overhead of the runtime operations is small. In fact, no performance degradation is found when using the runtime operations in the benchmark from the NAS parallel benchmark suite.