Efficient 2D and 3D watershed on graphics processing unit: block-asynchronous approaches based on cellular automata

Authors:
Pablo Quesada-Barriuso;Dora B. Heras;Francisco Argüello
Affiliations:
-;-;-
Venue:
Computers and Electrical Engineering
Year:
2013

Citing 12
Cited 0

Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Topographic distance and watershed lines

Signal Processing - Special issue on mathematical morphology and its applications to signal processing
A connected component approach to the watershed segmentation

ISMM '98 Proceedings of the fourth international symposium on Mathematical morphology and its applications to image and signal processing
Parallel watershed transformation algorithms for image segmentation

Parallel Computing
A brief history of cellular automata

ACM Computing Surveys (CSUR)
Evolution in asynchronous cellular automata

ICAL 2003 Proceedings of the eighth international conference on Artificial life
Parallel Asynchronous Watershed Algorithm-Architecture

IEEE Transactions on Parallel and Distributed Systems
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
Parallel graph component labelling with GPUs and CUDA

Parallel Computing
Advances on watershed processing on GPU architecture

ISMM'11 Proceedings of the 10th international conference on Mathematical morphology and its applications to image and signal processing
Efficient GPU Asynchronous Implementation of a Watershed Algorithm Based on Cellular Automata

ISPA '12 Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications
The Watershed Transform: Definitions, Algorithms and Parallelization Strategies

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

The watershed transform is a method for non-supervised image segmentation. In this paper we show that a watershed algorithm based on a cellular automaton is a good choice for the recent GPU architectures, especially when the synchronization rules are relaxed. In particular, we propose a block-asynchronous computation strategy that maps the cellular automaton on the thread blocks of the GPU. This method reduces the number of points of global synchronization allowing efficient exploitation of the memory hierarchy of the GPU. We also avoid the artifacts produced in the watershed lines by the block-asynchronous updating scheme by correcting the data propagation speed among the blocks. The proposals are compared to an OpenMP multithreaded code. The high speedups indicate the potential of this kind of algorithm for new architectures based on hundreds of cores. The method is tuned to be applied to 3D volumes obtaining similar results.