ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Data caches for superscalar processors
ICS '97 Proceedings of the 11th international conference on Supercomputing
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Adding a vector unit to a superscalar processor
ICS '99 Proceedings of the 13th international conference on Supercomputing
MPEG-4: multimedia for our time
IEEE Spectrum
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
VIS Speeds New Media Processing
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Implementation and Evaluation of the Complex Streamed Instruction Set
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Memory System Support for Image Processing
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
DLP +TLP Processors for the Next Generation of Media Workloads
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Cost-Effective Hardware Acceleration of Multimedia Applications
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
A Media-Enhanced Vector Architecture for Embedded Memory Systems
A Media-Enhanced Vector Architecture for Embedded Memory Systems
Vector microprocessors
High-bandwidth address generation unit
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
SAD prefetching for MPEG4 using flux caches
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Hi-index | 0.00 |
Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth.We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file.By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.