Efficient Image Processing Algorithms on the Scan Line Array Processor
IEEE Transactions on Pattern Analysis and Machine Intelligence
Bit-Serial Parallel Processing Systems
IEEE Transactions on Computers
Bi-criteria Pipeline Mappings for Parallel Image Processing
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.