Folding spatial image filters on the CM-5

Authors:
Sandra G. Dykes;Xiaodong Zhang
Affiliations:
-;-
Venue:
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Year:
1995

Citing 0
Cited 1

Distributed Edge Detection: Issues and Implementations

IEEE Computational Science & Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an efficient data-parallel algorithm for general convolutions, and compares its performance on the CM-5 to FFT frequency filtering. Sequential FFT filters are faster than sequential convolutions for windows beyond a very small size, typically 6/spl times/6 pixels. Our folded convolution algorithm shifts the convolution/FFT performance crossover to much larger filter sizes. For 256/spl times/256 images on a 512 node CM-5, the folded convolution is faster than FFT-filtering up to 36/spl times/36 windows. Results are reported for a naively implemented convolution, our folded convolution with default and optimized memory layouts, and FFT filtering using FFTs from the CM-5 scientific library (CMSSL). The data yield two important results: 1. Parallel convolutions on the CM-5 are faster than FFT filtering for a substantial and important range of window sizes. This is in contrast to sequential systems, where convolutions are more efficient only for very small windows. 2. Considerable performance gains are realized by folding the convolution and optimizing layout.