On Design of Parallel Memory Access Schemes for Video Coding

  • Authors:
  • Jarno K. Tanskanen;Reiner Creutzburg;Jarkko T. Niittylahti

  • Affiliations:
  • Department of Information Technology, Institute of Digital and Computer Systems, Tampere University of Technology, Tampere, Finland FIN-33101;Department of Computer Science, Fachhochschule Brandenburg, University of Applied Sciences, Brandenburg, Germany D-14737;Department of Information Technology, Tampere University of Technology, Institute of Digital and Computer Systems, Tampere, Finland FIN-33101

  • Venue:
  • Journal of VLSI Signal Processing Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Some of the modern powerful digital signal processors (DSPs) have byte-addressable internal data memory. This property is valuable especially in computationally demanding inter frame video encoding, where data accesses are typically unaligned according to word boundaries. The byte-addressable memory allows load or store command to start accessing from any byte-address, providing at most as many successive bytes from subsequent addresses as data bus can handle in parallel. Maybe the simplest way to construct such a byte-addressable memory is to use N 8-bit memory modules or banks to be accessed in parallel, when N is data bus width in bytes. However, in addition to byte-addressable subsequent bytes, memory consisting of parallel memory modules can provide much more versatile addressing capabilities with reasonable implementation cost. Versatile access formats can significantly reduce the need for data reordering in the register file. At first, we provide motivation for using parallel memory architecture with versatile access formats as an internal on-chip data memory of modern DSP. After this, notations are described and general view of parallel memory design is given. We propose some example parallel data memory architecture designs with data access formats especially helpful in H.263 encoding and MPEG-4 core profile motion and texture encoding. The examples are given for different data bus widths (16, 32, 64, and 128 bits). Finally, performance is shortly compared to other memory architectures and area, delay, and power figures are estimated.