Communications of the ACM
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
VIS Speeds New Media Processing
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Implementation and Evaluation of the Complex Streamed Instruction Set
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Implementation of a streaming execution unit
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.01 |
Current media ISA extensions such as Sun's VIS consist of SIMD-like instructions that operate on short vector registers. In order to exploit more parallelism in a superscalar processor provided with such instructions, the issue width has to be increased. In the Complex Streamed Instruction (CSI) set exploiting more parallelism does not involve issuing more instructions. In this paper we study how the performance of superscalar processors extended with CSI or VIS scales with the amount of parallel execution hardware. Results show that the performance of the CSI-enhanced processor scales very well. For example, increasing the datapath width of the CSI execution unit from 16 to 32 bytes improves the kernel-level performance by a factor of 1.56 on average. The VIS-enhanced machine is unable to utilize large amounts of parallel execution hardware efficiently. Due to the huge number of instructions that need to be executed, the decode-issue logic constitutes a bottleneck.