An Efficient Memory System for Image Processing
IEEE Transactions on Computers
On Linear Skewing Schemes and d-Ordered Vectors
IEEE Transactions on Computers
Hierarchical parallel memory systems and multiperiodic skewing schemes
Journal of Parallel and Distributed Computing
On access and alignment of data in a parallel processor
Information Processing Letters
Perfect Latin squares and parallel array access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Recent issues in pattern analysis and recognition
Efficient address generation in a parallel processor
Information Processing Letters
Architecture of an Array Processor Using a Nonlinear Skewing Scheme
IEEE Transactions on Computers
XOR storage schemes for frequently used data patterns
Journal of Parallel and Distributed Computing
Minimization of Memory and Network Contention for Accessing Arbitrary Data Patterns in SIMD Systems
IEEE Transactions on Computers
ACM Transactions on Graphics (TOG)
Memory Architecture and Parallel Access
Memory Architecture and Parallel Access
Subword Parallelism with MAX-2
IEEE Micro
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
A 3D Skewing and De-skewing Scheme for Conflict-Free Access to Rays in Volume Rendering
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Latin Squares for Parallel Array Access
IEEE Transactions on Parallel and Distributed Systems
Multiskewing-A Novel Technique for Optimal Parallel Memory Access
IEEE Transactions on Parallel and Distributed Systems
Scalable Parallel Memory Architectures for Video Coding
Journal of VLSI Signal Processing Systems
Architecture and applications of the HiPAR video signal processor
IEEE Transactions on Circuits and Systems for Video Technology
A design study of a 0.25-μm video signal processor
IEEE Transactions on Circuits and Systems for Video Technology
Scalable Parallel Memory Architectures for Video Coding
Journal of VLSI Signal Processing Systems
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Parallel Memory Architecture for Application-Specific Instruction-Set Processors
Journal of Signal Processing Systems
Parallel memory architecture for TTA processor
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
Some of the modern powerful digital signal processors (DSPs) have byte-addressable internal data memory. This property is valuable especially in computationally demanding inter frame video encoding, where data accesses are typically unaligned according to word boundaries. The byte-addressable memory allows load or store command to start accessing from any byte-address, providing at most as many successive bytes from subsequent addresses as data bus can handle in parallel. Maybe the simplest way to construct such a byte-addressable memory is to use N 8-bit memory modules or banks to be accessed in parallel, when N is data bus width in bytes. However, in addition to byte-addressable subsequent bytes, memory consisting of parallel memory modules can provide much more versatile addressing capabilities with reasonable implementation cost. Versatile access formats can significantly reduce the need for data reordering in the register file. At first, we provide motivation for using parallel memory architecture with versatile access formats as an internal on-chip data memory of modern DSP. After this, notations are described and general view of parallel memory design is given. We propose some example parallel data memory architecture designs with data access formats especially helpful in H.263 encoding and MPEG-4 core profile motion and texture encoding. The examples are given for different data bus widths (16, 32, 64, and 128 bits). Finally, performance is shortly compared to other memory architectures and area, delay, and power figures are estimated.