Parallelization schemes for memory optimization on the cell processor: a case study on the Harris corner detector

Authors:
Tarik Saidani;Lionel Lacassagne;Joel Falcou;Claude Tadonki;Samir Bouaziz
Affiliations:
Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France
Venue:
Transactions on high-performance embedded architectures and compilers III
Year:
2011

Citing 12
Cited 1

Accurate predictions of parallel program execution time

Journal of Parallel and Distributed Computing
Obstacle avoidance and navigation in the real world by a seeing robot rover

Obstacle avoidance and navigation in the real world by a seeing robot rover
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallelization schemes for memory optimization on the cell processor: a case study of image processing algorithm

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Scientific computing Kernels on the cell processor

International Journal of Parallel Programming
Experiences with parallelizing a bio-informatics program on the cell BE

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Drug design issues on the cell BE

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers

Boost.SIMD: generic programming for portable SIMDization

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell processor is a typical example of a heterogeneous multiprocessor on-chip architecture that uses several levels of parallelism to deliver high performance. Reducing the gap between peak performance and effective performance is the challenge for software tool developers and the application developers. Image processing and media applications are typical "main stream" applications. We use the Harris algorithm for the detection of interest points in an image as a benchmark to compare the performance of several parallel schemes on a Cell processor. The impact of the DMA controlled data transfers and the synchronizations between SPEs explains the differences between the performance of the different parallelization schemes. The scalability of the architecture is modeled and evaluated.