Initial results on the performance and cost of vector microprocessors

Authors:
Corinna G. Lee;Derek J. DeVries
Affiliations:
Department of Electrical and Computer Engineering, University of Toronto;Department of Electrical and Computer Engineering, University of Toronto
Venue:
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Year:
1997

Citing 15
Cited 9

SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Spert-II: A Vector Microprocessor System

Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Strategic directions in computer architecture

ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Billion-Transistor Architectures

Computer
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Guest Editors' Introduction: Media Processing: A New Design Target

IEEE Micro
The HP PA-8000 RISC CPU

IEEE Micro
A Case for Intelligent RAM

IEEE Micro

Simple vector microprocessors for multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Adding a vector unit to a superscalar processor

ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting a new level of DLP in multimedia applications

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
MOM: a matrix SIMD instruction set architecture for multimedia applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Speculative dynamic vectorization

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Trident: a scalable architecture for scalar, vector, and matrix operations

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A compiler framework for extracting superword level parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Mat-core: a decoupled matrix core extension for general-purpose processors

Neural, Parallel & Scientific Computations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasingly wider superscalar processors are experiencing diminishing performance returns while requiring larger portions of die area dedicated to control rather than datapath. As an alternative to using these processors to exploit parallelism effectively, we are investigating the viability of using single-chip vector microprocessors. This paper presents some initial results of our investigation where we compare the performance and cost of vector microprocessors to that of aggressive, out-of-order super- scalar microprocessors.On the performance side, we show that vector processors are able to execute a highly parallel, integer-based application 1.5- 7.3 times faster than superscalar processors can by exploiting parallelism more effectively. This ability stems from the use of vector instructions. Vector instructions exploit parallelism across loop iterations by implicitly re-scheduling operations and temporally localizing the parallelism. Vector instructions also reduce instruction bandwidth by more than an order of magnitude because they express an abundance of parallelism in a compact encoding.On the cost side we show that, to achieve these performance gains, highly parallel, integer-based vector microprocessors are no more costly to implement than existing in-order and out-of- order superscalar microprocessors. One reason for this is that the organization of a vector register file provides tremendous bandwidth without incurring a large area penalty. A second reason is that the control logic for issuing vector instructions is relatively simple.Both the performance gains and cost savings are possible because vector processors rely on a vectorizing compiler, rather than hardware, to detect parallelism and to express it in a compact form to the hardware. These initial results suggest that transferring this functionality to the compiler offers a tremendous performance/cost benefit.