Performance and Reliability Analysis Using Directed Acyclic Graphs
IEEE Transactions on Software Engineering
A bridging model for parallel computation
Communications of the ACM
Using random task graphs to investigate the potential benefits of heterogeneity in parallel systems
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel computing (2nd ed.): theory and practice
Parallel computing (2nd ed.): theory and practice
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
Automatic scalability analysis of parallel programs based on modeling techniques
Proceedings of the 7th international conference on Computer performance evaluation : modelling techniques and tools: modelling techniques and tools
Fortran M: a language for modular parallel programming
Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A cost calculus for parallel functional programming
Journal of Parallel and Distributed Computing
Scheduling UET-UCT series-parallel graphs on two processors
Theoretical Computer Science
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
The importance of synchronization structure in parallel program optimization
ICS '97 Proceedings of the 11th international conference on Supercomputing
Programming with POSIX threads
Programming with POSIX threads
Models and languages for parallel computation
ACM Computing Surveys (CSUR)
A quantitative comparison of parallel computation models
ACM Transactions on Computer Systems (TOCS)
Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Linear-time computability of combinatorial problems on series-parallel graphs
Journal of the ACM (JACM)
Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
Parallel programming in OpenMP
Parallel programming in OpenMP
Concepts and Notations for Concurrent Programming
ACM Computing Surveys (CSUR)
Series-parallel languages and the bounded-width property
Theoretical Computer Science
Task Parallelism in a High Performance Fortran Framework
IEEE Parallel & Distributed Technology: Systems & Technology
Requirements for Data-Parallel Programming Environments
IEEE Parallel & Distributed Technology: Systems & Technology
Symbolic Performance Modeling of Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
PARADIGM (version 2.0): A New HPF Compilation System
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
The Paderborn University BSP (PUB) Library - Design, Implementation and Performance
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Series-Parallel Posets: Algebra, Automata and Languages
STACS '98 Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science
Functional Skeletons for Parallel Coordination
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
A Kleene Iteration for Parallelism
Proceedings of the 18th Conference on Foundations of Software Technology and Theoretical Computer Science
Observations on Universality and Portability in High-Performance Computing
IWIA '98 Proceedings of the 1998 International Workshop on Innovative Architecture
Trials and Tribulations of Debugging Concurrency
Queue - RFID
UPC: Distributed Shared-Memory Programming
UPC: Distributed Shared-Memory Programming
Low-Cost Static Performance Prediction of Parallel Stochastic Task Compositions
IEEE Transactions on Parallel and Distributed Systems
Computer
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Mapping unstructured applications into nested parallelism
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
A preliminary nested-parallel framework to efficiently implement scientific applications
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Hi-index | 0.00 |
The restricted synchronization structure of so-called structured parallel programming paradigms has an advantageous effect on programmer productivity, cost modeling, and scheduling complexity. However, imposing these restrictions can lead to a loss of parallelism, compared to using a programming approach that does not impose synchronization structure. In this paper we study the potential loss of parallelism when expressing parallel computations into a programming model which limits the computation graph (DAG) to series-parallel topology, which characterizes all well-known structured programming models. We present an analytical model that approximately captures this loss of parallelism in terms of simple parameters that are related to DAG topology and workload distribution. We validate the model using a wide range of synthetic and real-world parallel computations running on shared and distributed-memory machines. Although the loss of parallelism is theoretically unbounded, our measurements show that for all above applications the performance loss due to choosing a series-parallel structured model is invariably limited up to 10%. In all cases, the loss of parallelism is predictable provided the topology and workload variability of the DAG are known.