Communications of the ACM - Special section on computer architecture
Parallel efficiency can be greater than unity
Parallel Computing
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Tradeoffs in the Design of Single Chip Multiprocessors
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Micro-Threading: A New Approach to Future RISC
ACAC '00 Proceedings of the 5th Australasian Computer Architecture Conference
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Performance of a micro-threaded pipeline
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture
IEEE Micro
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Issues and support for dynamic register allocation
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
An introduction to program and thread algebra
CiE'06 Proceedings of the Second conference on Computability in Europe: logical Approaches to Computational Barriers
The challenges of massive on-chip concurrency
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
This paper looks at a combination of two techniques, one of which, using a vector instruction set, has a long history dating back to pipelined vector supercomputers, such as the Cray 1 and its successors. The other technique, multi-threading, is also well understood. The novel approach proposed in this paper combines both vertical and horizontal micro-threading with vector instruction descriptors. It will be shown that a family of threads can represent a vector instruction with dependencies between the instances of that family, the iterations. This technique gives a very low overhead in implementing an n-way loop and is able to tolerate high memory latency. The use of micro-threading to handle dependencies between threads provides the ability to trade-off between instruction level parallelism and loop parallelism. The paper describes the means by which instruction classes may be instanced as independent parallel micro-threads and illustrates the speed-up that may be obtained compared to using a conventional loop.