Performance Metrics for Embedded Parallel Pipelines

Authors:
Martin Fleury;Andrew C. Downton;Adrian F. Clark
Affiliations:
Univ. of Essex, Wivenhoe Park, UK;Univ. of Essex, Wivenhoe Park, UK;Univ. of Essex, Wivenhoe Park, UK
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2000

Citing 37
Cited 0

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Communicating process architecture: transputers and Occam

Proc. of an advanced course on Future parallel computers.
Multicomputer networks: message-based parallel processing

Multicomputer networks: message-based parallel processing
Hybrid architecture paradigms in a radar ESM data processing application

Microprocessors & Microsystems - Special issue: Applying the transputer II, applications
Multiprocessor performance

Multiprocessor performance
A System Design/Scheduling Strategy for Parallel Image Processing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computer and communication systems performance modelling

Computer and communication systems performance modelling
Experiments with a Program Timing Tool Based on Source-Level Timing Schema

Computer - Special issue on real-time systems
General purpose parallel architectures

Handbook of theoretical computer science (vol. A)
Past, present, parallel: a survey of available parallel computer systems

Past, present, parallel: a survey of available parallel computer systems
Dynamic Control and Prototyping of Parallel Algorithms for Intermediate- and High-Level Vision

Computer
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
SIEVE: a performance debugging environment for parallel programs

Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Highly parallel computing (2nd ed.)

Highly parallel computing (2nd ed.)
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Performance Models for the Processor Farm Paradigm

IEEE Transactions on Parallel and Distributed Systems
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Structured development of parallel programs

Structured development of parallel programs
Analysis and Applications of the Delay Cycle for the M/M/c Queueing System

Journal of the ACM (JACM)
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Dynamic, Object-Oriented Parallel Processing

IEEE Parallel & Distributed Technology: Systems & Technology
The Enterprise Model for Developing Distributed Applications

IEEE Parallel & Distributed Technology: Systems & Technology
A Survey of Wormhole Routing Techniques in Direct Networks

Computer
Parallelizing a GIS on a Shared Address Space Architecture

Computer
Application Performance on the MIT Alewife Machine

Computer
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Performance of Synchronous Parallel Algorithms with Regular Structures

IEEE Transactions on Parallel and Distributed Systems
Optimal Processor Assignment for a Class of Pipelined Computations

IEEE Transactions on Parallel and Distributed Systems
Spoken Language Recognition on a DSP Array Processor

IEEE Transactions on Parallel and Distributed Systems
Lessons Learned from Implementing BSP

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Karhünen-Loève Transform: An Exercise in Simple Image-Processing Parallel Pipelines

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Parallel Structure in an Integrated Speech-Recognition Network

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Design, Implementation and Evaluation of Parallel Pipelined STAP on Parallel Computers

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Experimental Study of Compiler Techniques for NUMA Machines

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing medium-grained parallelization. The target applications are continuous-flow embedded systems. The use of order statistics on this type of system is compared to previous practical usage which appears largely confined to traditional Non-Uniform Memory Access (NUMA) machines for loop parallelization. A range of suitable performance metrics which give upper bounds or estimates for task durations are discussed. The metrics have a practical role when included in prediction equations in checking fidelity to an application performance specification. An empirical study applies the mathematical findings to the performance of a multicomputer for a synchronous pipeline stage. The results of a simulation are given for larger numbers of processors. In a further simulation, the results are extended to take account of waiting-time distributions while data are buffered between stages of an asynchronous pipeline. Order statistics are also employed to estimate the degradation due to an output ordering constraint. Practical illustrations in the image communication and vision application domains are included.