Multiprocessor performance
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Analytical performance prediction on multicomputers
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Optimized communication patterns on workstation clusters
Parallel Computing
NAS Parallel Benchmark Results
IEEE Parallel & Distributed Technology: Systems & Technology
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures
IEEE Parallel & Distributed Technology: Systems & Technology
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
Assessing the Performance of the New IBM SP2 Communication Subsystem
IEEE Parallel & Distributed Technology: Systems & Technology
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
Message-Passing Performance of Various Computers
Message-Passing Performance of Various Computers
IEEE Transactions on Parallel and Distributed Systems
Modelling asynchronous message passing in small cluster environments
International Journal of Computers and Applications
Parallel processing for image and video processing: Issues and challenges
Parallel Computing
Hi-index | 0.00 |
This paper presents an analytical performance prediction model andmethodology that can be used to predict the execution time, speedup,scalability and similar performance metrics of a large set of imageprocessing operations running on a p-processor parallelsystem. The model which requires only a few parameters obtainable on aminimal system can help in the systematic design, evaluation andperformance tuning of parallel image processing systems. Using the modelone can reason about the performance of a parallel image processing systemprior to implementation. The method can also support programmers indetecting critical parts of an implementation and system designers inpredicting hardware performance and the effect of hardware parameterchanges on performance. The execution of parallel image processingoperations was studied and operations were arranged in three main problemclasses based on data locality and the communication patterns of thealgorithms. The core of the method is the derivation of the overheadfunction, as it is the overhead that determines the achievable speedup. Theoverheads were examined and modelled for each class. The use of the methodis illustrated by four class-representative image processing algorithms: image-scalar addition, convolution, histogram calculation and the Fast Fourier Transform. The developed performance model has been validated on a16-node parallel machine and it has been shown that the model is able topredict the parallel run-time and other performance metrics of parallelimage processing operations accurately.