Performance implications of synchronization structure in parallel programming

Authors:
Arturo González-Escribano;Arjan J. C. van Gemund;Valentín Cardeñoso-Payo
Affiliations:
Dept. de Informática, Universidad de Valladolid, E.T.I.T. Campus Miguel Delibes, 47011 Valladolid, Spain;Faculty of Electrical Engineering, Mathematics, and Computer Science, P.O. Box 5031, NL-2600 GA Delft, The Netherlands;Dept. de Informática, Universidad de Valladolid, E.T.I.T. Campus Miguel Delibes, 47011 Valladolid, Spain
Venue:
Parallel Computing
Year:
2009

Citing 38
Cited 0

Performance and Reliability Analysis Using Directed Acyclic Graphs

IEEE Transactions on Software Engineering
A bridging model for parallel computation

Communications of the ACM
Using random task graphs to investigate the potential benefits of heterogeneity in parallel systems

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
MPI: a message passing interface

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel computing (2nd ed.): theory and practice

Parallel computing (2nd ed.): theory and practice
Direct bulk-synchronous parallel algorithms

Journal of Parallel and Distributed Computing
Automatic scalability analysis of parallel programs based on modeling techniques

Proceedings of the 7th international conference on Computer performance evaluation : modelling techniques and tools: modelling techniques and tools
Fortran M: a language for modular parallel programming

Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A cost calculus for parallel functional programming

Journal of Parallel and Distributed Computing
Scheduling UET-UCT series-parallel graphs on two processors

Theoretical Computer Science
BSP vs LogP

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
The importance of synchronization structure in parallel program optimization

ICS '97 Proceedings of the 11th international conference on Supercomputing
Programming with POSIX threads

Programming with POSIX threads
Models and languages for parallel computation

ACM Computing Surveys (CSUR)
A quantitative comparison of parallel computation models

ACM Transactions on Computer Systems (TOCS)
Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Linear-time computability of combinatorial problems on series-parallel graphs

Journal of the ACM (JACM)
Portable and Efficient Parallel Computing Using the BSP Model

IEEE Transactions on Computers
Parallel programming in OpenMP

Parallel programming in OpenMP
Concepts and Notations for Concurrent Programming

ACM Computing Surveys (CSUR)
Series-parallel languages and the bounded-width property

Theoretical Computer Science
Task Parallelism in a High Performance Fortran Framework

IEEE Parallel & Distributed Technology: Systems & Technology
Requirements for Data-Parallel Programming Environments

IEEE Parallel & Distributed Technology: Systems & Technology
Symbolic Performance Modeling of Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
PARADIGM (version 2.0): A New HPF Compilation System

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
The Paderborn University BSP (PUB) Library - Design, Implementation and Performance

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Series-Parallel Posets: Algebra, Automata and Languages

STACS '98 Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science
Functional Skeletons for Parallel Coordination

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
A Kleene Iteration for Parallelism

Proceedings of the 18th Conference on Foundations of Software Technology and Theoretical Computer Science
Observations on Universality and Portability in High-Performance Computing

IWIA '98 Proceedings of the 1998 International Workshop on Innovative Architecture
Trials and Tribulations of Debugging Concurrency

Queue - RFID
UPC: Distributed Shared-Memory Programming

UPC: Distributed Shared-Memory Programming
Low-Cost Static Performance Prediction of Parallel Stochastic Task Compositions

IEEE Transactions on Parallel and Distributed Systems
The Problem with Threads

Computer
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Mapping unstructured applications into nested parallelism

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
A preliminary nested-parallel framework to efficiently implement scientific applications

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The restricted synchronization structure of so-called structured parallel programming paradigms has an advantageous effect on programmer productivity, cost modeling, and scheduling complexity. However, imposing these restrictions can lead to a loss of parallelism, compared to using a programming approach that does not impose synchronization structure. In this paper we study the potential loss of parallelism when expressing parallel computations into a programming model which limits the computation graph (DAG) to series-parallel topology, which characterizes all well-known structured programming models. We present an analytical model that approximately captures this loss of parallelism in terms of simple parameters that are related to DAG topology and workload distribution. We validate the model using a wide range of synthetic and real-world parallel computations running on shared and distributed-memory machines. Although the loss of parallelism is theoretically unbounded, our measurements show that for all above applications the performance loss due to choosing a series-parallel structured model is invariably limited up to 10%. In all cases, the loss of parallelism is predictable provided the topology and workload variability of the DAG are known.