Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing

  • Authors:
  • Svetislav Momcilovic;Leonel Sousa

  • Affiliations:
  • INESC-ID IST/TULisbon, Lisboa, Portugal;INESC-ID IST/TULisbon, Lisboa, Portugal

  • Venue:
  • Journal of Signal Processing Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Local processing, which is a dominant type of processing in image and video applications, requires a huge computational power to be performed in real-time. However, processing locality, in space and/or in time, allows to exploit data parallelism and data reusing. Although it is possible to exploit these properties to achieve high performance image and video processing in multi-core processors, it is necessary to develop suitable models and parallel algorithms, in particular for non-shared memory architectures. This paper proposes an efficient and simple model for local image and video processing on non-shared memory multi-core architectures. This model adopts a single program multiple data approach, where data is distributed, processed and reused in an optimal way, regarding the data size, the number of cores and the local memory capacity. The model was experimentally evaluated by developing video local processing algorithms and programming the Cell Broadband Engine multi-core processor, namely for advanced video motion estimation and in-loop deblocking filtering. Furthermore, based on these experiences it is also addressed the main challenges of vectorization, and the reduction of branch mispredictions and computational load imbalances. The limits and advantages of the regular and adaptive algorithms are also discussed. Experimental results show the adequacy of the proposed model to perform local video processing, and that real-time is achieved even to process the most demanding parts of advanced video coding. Full-pixel motion estimation is performed over high resolution video (720脳576 pixels) at a rate of 30 frames per second, by considering large search areas and five reference frames.