Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor

  • Authors:
  • Takeshi Kumaki;Masakatsu Ishizaki;Tetsushi Koide;Hans Jürgen Mattausch;Yasuto Kuroda;Hideyuki Noda;Katsumi Dosaka;Kazutami Arimoto;Kazunori Saito

  • Affiliations:
  • -;-;-;-;-;-;-;-;-

  • Venue:
  • IEICE - Transactions on Information and Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.