The IBM System/370 Vector Architecture: Design Considerations
IEEE Transactions on Computers
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Communications of the ACM
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Evaluating MMX technology using DSP and multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Multimedia Execution Hardware Accelerator
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
PNG: The Definitive Guide
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Internet Streaming SIMD Extensions
Computer
VIS Speeds New Media Processing
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
On the Efficiency of Reductions in µ-SIMD Media Extensions
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Implementation and Evaluation of the Complex Streamed Instruction Set
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Performance Scalability of Multimedia Instruction Set Extensions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Cost-Effective Hardware Acceleration of Multimedia Applications
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
The Reconfigurable Streaming Vector Processor (RSVPTM)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Implementation of a streaming execution unit
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Matrix register file and extended subwords: two techniques for embedded media processors
Proceedings of the 2nd conference on Computing frontiers
Avoiding conversion and rearrangement overhead in SIMD architectures
International Journal of Parallel Programming
ALP: Efficient support for all levels of parallelism for complex media applications
ACM Transactions on Architecture and Code Optimization (TACO)
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
A multi-streaming SIMD architecture for multimedia applications
Proceedings of the 6th ACM conference on Computing frontiers
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
Hi-index | 0.00 |
An instruction set extension designed to accelerate multimedia applications is presented and evaluated. In the proposed complex streamed instruction (CSI) set, a single instruction can process vector data streams of arbitrary length and stride and combines complex memory accesses (with implicit prefetching), program control for vector sectioning, and complex computations on multiple data in a single operation. In this way, CSI eliminates overhead instructions (such as instructions for data sectioning, alignment, reorganization, and packing/unpacking) often needed in applications utilizing MMX-like extensions and accelerates key multimedia kernels. Simulation results demonstrate that a superscalar processor extended with CSI outperforms the same processor enhanced with Sun's VIS extension by a factor of up to 7.77 on key multimedia kernels and by up to 35% on full applications.