Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
Communications of the ACM
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Evaluating MMX technology using DSP and multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
VIS Speeds New Media Processing
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Implementation and Evaluation of the Complex Streamed Instruction Set
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
The Complex Streamed Instruction (CSI) set is an architectural paradigm designed to accelerate multimedia applications. These applications are characterized by streaming operations on small-width data elements such as 8-bit pixels or 16-bit audio samples. CSI instructions operate on two-dimensional data streams in a SIMD fashion and are able to process streams of arbitrary length. In this paper we evaluate the performance of the CSI architecture on a set of important image processing kernels. These kernels are characterized by little data reuse which results in poor cache performance. Simulation results show that CSI provides a speedup by a factor of up to 3.98 (2.60 on average) compared to Sun's media ISA extension VIS. We also analyze the scalability of VIS and CSI with respect to memory bandwidth. The results show that CSI scales much better than VIS with increasing bandwidth.