Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines

Authors:
Chris Jesshope
Affiliations:
Massey University, Palmerston North, New Zealand
Venue:
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Year:
2001

Citing 6
Cited 8

The cosmic cube

Communications of the ACM - Special section on computer architecture
Parallel efficiency can be greater than unity

Parallel Computing
The transputer

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Tradeoffs in the Design of Single Chip Multiprocessors

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Micro-Threading: A New Approach to Future RISC

ACAC '00 Proceedings of the 5th Australasian Computer Architecture Conference
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture

Performance of a micro-threaded pipeline

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture

IEEE Micro
Compiling for vector-thread architectures

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Issues and support for dynamic register allocation

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
An introduction to program and thread algebra

CiE'06 Proceedings of the Second conference on Computability in Europe: logical Approaches to Computational Barriers
The challenges of massive on-chip concurrency

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper looks at a combination of two techniques, one of which, using a vector instruction set, has a long history dating back to pipelined vector supercomputers, such as the Cray 1 and its successors. The other technique, multi-threading, is also well understood. The novel approach proposed in this paper combines both vertical and horizontal micro-threading with vector instruction descriptors. It will be shown that a family of threads can represent a vector instruction with dependencies between the instances of that family, the iterations. This technique gives a very low overhead in implementing an n-way loop and is able to tolerate high memory latency. The use of micro-threading to handle dependencies between threads provides the ability to trade-off between instruction level parallelism and loop parallelism. The paper describes the means by which instruction classes may be instanced as independent parallel micro-threads and illustrates the speed-up that may be obtained compared to using a conventional loop.