Optimal mapping of sequences of data parallel tasks

Authors:
Jaspal Subhlok;Gary Vondran
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh PA;Advanced LaserJet Operation, Hewlett Packard Company, Boise, ID
Venue:
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1995

Citing 11
Cited 22

Assignment problems in parallel and distributed computing

Assignment problems in parallel and distributed computing
Exploiting task and data parallelism on a multicomputer

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Latency and bandwidth considerations in parallel robotics image processing

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The advantages of multiple parallelizations in combinatorial search

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Scheduling and code generation for parallel architectures

Scheduling and code generation for parallel architectures
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Communication and memory requirements as the basis for mapping task and data parallel programs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Task Parallelism in a High Performance Fortran Framework

IEEE Parallel & Distributed Technology: Systems & Technology
Optimal Processor Assignment for a Class of Pipelined Computations

IEEE Transactions on Parallel and Distributed Systems
Do&Merge: Integrating Parallel Loops and Reductions

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A SOFTWARE ARCHITECTURE FOR MULTIDISCIPLINARY APPICATIONS: INTEGRATING TASK AND DATA PARALLELISM

A SOFTWARE ARCHITECTURE FOR MULTIDISCIPLINARY APPICATIONS: INTEGRATING TASK AND DATA PARALLELISM

Optimal latency-throughput tradeoffs for data parallel pipelines

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A new model for integrated nested task and data parallel programming

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation of parallel multimedia computations—extending retiming theory and Amdahl's law

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
A framework for performance-based program partitioning

Progress in computer research
A framework for performance-based program partitioning

Progress in computer research
An integer programming approach for static mapping onto heterogeneous real-time systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Detection of Implicit Parallelisms in the Task Parallel Language

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Airshed Pollution Modeling: A Case Study in Application Development in an HPF Environment

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Mapping pipeline skeletons onto heterogeneous platforms

Journal of Parallel and Distributed Computing
Mapping Pipeline Skeletons onto Heterogeneous Platforms

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)

International Journal of High Performance Computing Applications
Scheduling Recurrent Precedence-Constrained Task Graphs on a Symmetric Shared-Memory Multiprocessor

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A throughput-driven task creation and mapping for network processors

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Computing the throughput of probabilistic and replicated streaming applications

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Models and complexity results for performance and energy optimization of concurrent streaming applications

International Journal of High Performance Computing Applications
Throughput optimization for pipeline workflow scheduling with setup times

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Reliability and performance optimization of pipelined real-time systems

Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that these applications are best mapped to a massively parallel machine by dividing the tasks into modules and assigning a subset of the available processors to each module. This paper addresses the problem of optimally mapping such applications onto a massively parallel machine. We formulate the problem of optimizing throughput in task pipelines and present two new solution algorithms. The formulation uses a general and realistic model for inter-task communication, takes memory constraints into account, and addresses the entire problem of mapping which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The first algorithm is based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time. We also present a heuristic algorithm that is linear in the number of processors and establish with theoretical and practical results that the solutions obtained are optimal in practical situations. The entire framework is implemented as an automatic mapping tool for the Fx parallelizing compiler for High Performance Fortran. We present experimental results that demonstrate the importance of choosing a good mapping and show that the methods presented yield efficient mappings and predict optimal performance accurately.