Parallelization schemes for memory optimization on the cell processor: a case study on the Harris corner detector

  • Authors:
  • Tarik Saidani;Lionel Lacassagne;Joel Falcou;Claude Tadonki;Samir Bouaziz

  • Affiliations:
  • Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France;Institut d'Electronique Fondamentale, Université de Paris Sud, Orsay Cedex, France

  • Venue:
  • Transactions on high-performance embedded architectures and compilers III
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Cell processor is a typical example of a heterogeneous multiprocessor on-chip architecture that uses several levels of parallelism to deliver high performance. Reducing the gap between peak performance and effective performance is the challenge for software tool developers and the application developers. Image processing and media applications are typical "main stream" applications. We use the Harris algorithm for the detection of interest points in an image as a benchmark to compare the performance of several parallel schemes on a Cell processor. The impact of the DMA controlled data transfers and the synchronizations between SPEs explains the differences between the performance of the different parallelization schemes. The scalability of the architecture is modeled and evaluated.