Implementation of a streaming execution unit

Authors:
Dmitry Cheresiz;Ben Juurlink;Stamatis Vassiliadis;Harry A. G. Wijshoff
Affiliations:
Computer Engineering Laboratory, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands and Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2 ...;Computer Engineering Laboratory, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands;Computer Engineering Laboratory, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands;Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
Venue:
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Year:
2003

Citing 11
Cited 1

Intel MMX for multimedia PCs

Communications of the ACM
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Performance of image and video processing with general-purpose processors and media ISA extensions

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache performance for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Multimedia Execution Hardware Accelerator

Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Internet Streaming SIMD Extensions

Computer
Measuring the Performance of Multimedia Instruction Sets

IEEE Transactions on Computers
Implementation and Evaluation of the Complex Streamed Instruction Set

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Performance Scalability of Multimedia Instruction Set Extensions

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing

The CSI multimedia architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Complex Streamed Instruction (CSI) set is an instruction set extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory and the streams can be of any length. Sectioning (the process of splitting up arbitrary-length streams into fixed-size sections that fit in a vector register), data alignment, and conversion between different packed data types are all performed in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that it is interfaced with the first-level data cache. In particular, it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost.