On Design of Parallel Memory Access Schemes for Video Coding

Authors:
Jarno K. Tanskanen;Reiner Creutzburg;Jarkko T. Niittylahti
Affiliations:
Department of Information Technology, Institute of Digital and Computer Systems, Tampere University of Technology, Tampere, Finland FIN-33101;Department of Computer Science, Fachhochschule Brandenburg, University of Applied Sciences, Brandenburg, Germany D-14737;Department of Information Technology, Tampere University of Technology, Institute of Digital and Computer Systems, Tampere, Finland FIN-33101
Venue:
Journal of VLSI Signal Processing Systems
Year:
2005

Citing 21
Cited 5

An Efficient Memory System for Image Processing

IEEE Transactions on Computers
On Linear Skewing Schemes and d-Ordered Vectors

IEEE Transactions on Computers
Hierarchical parallel memory systems and multiperiodic skewing schemes

Journal of Parallel and Distributed Computing
On access and alignment of data in a parallel processor

Information Processing Letters
Perfect Latin squares and parallel array access

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Parallel access to rectangles

Recent issues in pattern analysis and recognition
Efficient address generation in a parallel processor

Information Processing Letters
Architecture of an Array Processor Using a Nonlinear Skewing Scheme

IEEE Transactions on Computers
XOR storage schemes for frequently used data patterns

Journal of Parallel and Distributed Computing
Minimization of Memory and Network Contention for Accessing Arbitrary Data Patterns in SIMD Systems

IEEE Transactions on Computers
The 8 by 8 display

ACM Transactions on Graphics (TOG)
Memory Architecture and Parallel Access

Memory Architecture and Parallel Access
Subword Parallelism with MAX-2

IEEE Micro
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study

IEEE Transactions on Computers
A 3D Skewing and De-skewing Scheme for Conflict-Free Access to Rays in Volume Rendering

IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Latin Squares for Parallel Array Access

IEEE Transactions on Parallel and Distributed Systems
Multiskewing-A Novel Technique for Optimal Parallel Memory Access

IEEE Transactions on Parallel and Distributed Systems
Scalable Parallel Memory Architectures for Video Coding

Journal of VLSI Signal Processing Systems
Architecture and applications of the HiPAR video signal processor

IEEE Transactions on Circuits and Systems for Video Technology
A design study of a 0.25-μm video signal processor

IEEE Transactions on Circuits and Systems for Video Technology

Scalable Parallel Memory Architectures for Video Coding

Journal of VLSI Signal Processing Systems
Configurable data memory for multimedia processing

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Parallel Memory Architecture for Application-Specific Instruction-Set Processors

Journal of Signal Processing Systems
Parallel memory architecture for TTA processor

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some of the modern powerful digital signal processors (DSPs) have byte-addressable internal data memory. This property is valuable especially in computationally demanding inter frame video encoding, where data accesses are typically unaligned according to word boundaries. The byte-addressable memory allows load or store command to start accessing from any byte-address, providing at most as many successive bytes from subsequent addresses as data bus can handle in parallel. Maybe the simplest way to construct such a byte-addressable memory is to use N 8-bit memory modules or banks to be accessed in parallel, when N is data bus width in bytes. However, in addition to byte-addressable subsequent bytes, memory consisting of parallel memory modules can provide much more versatile addressing capabilities with reasonable implementation cost. Versatile access formats can significantly reduce the need for data reordering in the register file. At first, we provide motivation for using parallel memory architecture with versatile access formats as an internal on-chip data memory of modern DSP. After this, notations are described and general view of parallel memory design is given. We propose some example parallel data memory architecture designs with data access formats especially helpful in H.263 encoding and MPEG-4 core profile motion and texture encoding. The examples are given for different data bus widths (16, 32, 64, and 128 bits). Finally, performance is shortly compared to other memory architectures and area, delay, and power figures are estimated.