An Efficient Memory System for Image Processing
IEEE Transactions on Computers
On Linear Skewing Schemes and d-Ordered Vectors
IEEE Transactions on Computers
On access and alignment of data in a parallel processor
Information Processing Letters
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Nonprime Memory Systems and Error Correction in Address Translation
IEEE Transactions on Computers
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
IEEE Transactions on Computers
Architecture Concepts for Multimedia Signal Processing
Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Internet Streaming SIMD Extensions
Computer
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
The TigerSHARC DSP Architecture
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Conflict-Free Access for Streams in Multimodule Memories
IEEE Transactions on Computers
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Scalable Parallel Memory Architectures for Video Coding
Journal of VLSI Signal Processing Systems
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
HiBRID-SoC: A Multi-Core SoC Architecture for Multimedia Signal Processing
Journal of VLSI Signal Processing Systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
An efficient deblocking filter architecture with 2-dimensional parallel memory for H.264/AVC
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Configurable implementation of parallel memory based real-time video downscaler
Microprocessors & Microsystems
Parallel Memory Architecture for Arbitrary Stride Accesses
DDECS '06 Proceedings of the 2006 IEEE Design and Diagnostics of Electronic Circuits and systems
Multimedia rectangularly addressable memory
IEEE Transactions on Multimedia
A methodology to evaluate memory architecture design tradeoffs for video signal processors
IEEE Transactions on Circuits and Systems for Video Technology
The Equator MAP-CA™ DSP: an end-to-end broadband signal processor™ VLIW
IEEE Transactions on Circuits and Systems for Video Technology
Overview of research efforts on media ISA extensions and their usage in video coding
IEEE Transactions on Circuits and Systems for Video Technology
Byte and modulo addressable parallel memory architecture for video coding
IEEE Transactions on Circuits and Systems for Video Technology
A Parallel Memory System for Variable Block-Size Motion Estimation Algorithms
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
In modern multimedia applications, memory bottleneck can be alleviated with special stride data accesses. Data elements in stride access can be retrieved in parallel with parallel memories, in which the idea is to increase memory bandwidth with several memory modules working in parallel and feed the processor with only necessary data. Arbitrary stride access capability with interleaved memories is described in previous research where the skewing scheme is changed at run time according to the currently used stride. This paper presents the improved schemes which are adapted to parallel memories. The proposed novel parallel memory implementation allows conflict free accesses with all the constant strides which has not been possible in prior application specific parallel memories. Moreover, the possible access locations are unrestricted and the accessed data element count equals to the number of memory modules. Timing and area estimates are given for Altera Stratix FPGA and 0.18 micrometer CMOS process with memory module count from 2 to 32. The FPGA results show 129 MHz clock frequency for a system with 16 memory modules when read and write latencies are 3 and 2 clock cycles, respectively. The complexity of the proposed system is shown to be a trade-off between application specific and highly configurable parallel memory system.