Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Communicating process architecture: transputers and Occam
Proc. of an advanced course on Future parallel computers.
Multicomputer networks: message-based parallel processing
Multicomputer networks: message-based parallel processing
Hybrid architecture paradigms in a radar ESM data processing application
Microprocessors & Microsystems - Special issue: Applying the transputer II, applications
Multiprocessor performance
A System Design/Scheduling Strategy for Parallel Image Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Computer and communication systems performance modelling
Computer and communication systems performance modelling
Experiments with a Program Timing Tool Based on Source-Level Timing Schema
Computer - Special issue on real-time systems
General purpose parallel architectures
Handbook of theoretical computer science (vol. A)
Past, present, parallel: a survey of available parallel computer systems
Past, present, parallel: a survey of available parallel computer systems
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
SIEVE: a performance debugging environment for parallel programs
Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Highly parallel computing (2nd ed.)
Highly parallel computing (2nd ed.)
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Performance Models for the Processor Farm Paradigm
IEEE Transactions on Parallel and Distributed Systems
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Structured development of parallel programs
Structured development of parallel programs
Analysis and Applications of the Delay Cycle for the M/M/c Queueing System
Journal of the ACM (JACM)
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Dynamic, Object-Oriented Parallel Processing
IEEE Parallel & Distributed Technology: Systems & Technology
The Enterprise Model for Developing Distributed Applications
IEEE Parallel & Distributed Technology: Systems & Technology
Performance of Synchronous Parallel Algorithms with Regular Structures
IEEE Transactions on Parallel and Distributed Systems
Optimal Processor Assignment for a Class of Pipelined Computations
IEEE Transactions on Parallel and Distributed Systems
Spoken Language Recognition on a DSP Array Processor
IEEE Transactions on Parallel and Distributed Systems
Lessons Learned from Implementing BSP
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Karhünen-Loève Transform: An Exercise in Simple Image-Processing Parallel Pipelines
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Parallel Structure in an Integrated Speech-Recognition Network
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Design, Implementation and Evaluation of Parallel Pipelined STAP on Parallel Computers
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Experimental Study of Compiler Techniques for NUMA Machines
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Hi-index | 0.00 |
A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing medium-grained parallelization. The target applications are continuous-flow embedded systems. The use of order statistics on this type of system is compared to previous practical usage which appears largely confined to traditional Non-Uniform Memory Access (NUMA) machines for loop parallelization. A range of suitable performance metrics which give upper bounds or estimates for task durations are discussed. The metrics have a practical role when included in prediction equations in checking fidelity to an application performance specification. An empirical study applies the mathematical findings to the performance of a multicomputer for a synchronous pipeline stage. The results of a simulation are given for larger numbers of processors. In a further simulation, the results are extended to take account of waiting-time distributions while data are buffered between stages of an asynchronous pipeline. Order statistics are also employed to estimate the degradation due to an output ordering constraint. Practical illustrations in the image communication and vision application domains are included.