DSP Processor Fundamentals: Architectures and Features
DSP Processor Fundamentals: Architectures and Features
Subword Parallelism with MAX-2
IEEE Micro
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
A Register File with Transposed Access Mode
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
MMX-Based DCT and MC Algorithms for Real-Time Pure Software MPEG Decoding
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Architecture and applications of the HiPAR video signal processor
IEEE Transactions on Circuits and Systems for Video Technology
A design study of a 0.25-μm video signal processor
IEEE Transactions on Circuits and Systems for Video Technology
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Hi-index | 0.00 |
Current video compression standards, which process frames macroblock by macroblock, employ several processing functions to achieve the compression. These functions refer to data memory address space in different ways. E.g., performing motion estimation and motion compensation functions requires many times data accesses unaligned to word boundaries. On the other hand, Discrete Cosine Transformation (DCT) and inverse of it (IDCT) for 8 × 8 block can be performed first for rows and then for columns. Thus, transposition is needed between these two stages. Among other things, parallel memory architecture can provide a solution for these tasks. In our other paper, we shortly surveyed parallel memory architectures and proposed parallel memory architecture designs for different data path widths for video coding applications. In this paper, we construct video coding function examples by using the proposed parallel data memory efficiently. Furthermore, performance and implementation cost of the parallel memory architecture are estimated and compared to more conventional memory architectures. The examples are given for different data bus widths (16, 32, 64, and 128 bits). We show that the parallel memory can keep the data path fully utilized in many video coding function implementations. This ensures high-speed operation and full utilization of the processing resources.