Parallel algorithms for super performance

Authors:
D. J. Shakshober
Affiliations:
Digital Equipment Corporation, BXB2-2/G08, 60 Codman Hill Road, Boxboro, Ma
Venue:
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Year:
1989

Citing 6
Cited 0

Parallel processing: the Cm* experience

Parallel processing: the Cm* experience
Digital image processing

Digital image processing
An Adaptation of the Fast Fourier Transform for Parallel Processing

Journal of the ACM (JACM)
DFT/FFT and Convolution Algorithms: Theory and Implementation

DFT/FFT and Convolution Algorithms: Theory and Implementation
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Digital Picture Processing

Digital Picture Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the development of parallel algorithms on M31, a large-scale, shared memory multiprocessor VAX computer. Matrix operations have been optimized for a subset of the BLAS, the Basic Linear Algebra Subroutines. Efficient image processing algorithms were also developed for parallel Convolution, Correlation, and Fast Fourier Transforms (non-synchronizing one and two dimensional FFTs). The effect of matrix partitioning was examined using two different memory allocation strategies. We found that contiguous memory partitioning can yield performance gains beyond the linear expectation. Super performance was achieved through a parallel algorithm devised to minimize cache-replacements. Fewer replacements allowed high CPU utilization with minimal system overhead. Inefficient matrix partitioning tended to stifle parallel performance because frequent cache misses created heavy bus traffic and thus increased system overhead.