Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines

  • Authors:
  • Chris Jesshope

  • Affiliations:
  • Massey University, Palmerston North, New Zealand

  • Venue:
  • ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper looks at a combination of two techniques, one of which, using a vector instruction set, has a long history dating back to pipelined vector supercomputers, such as the Cray 1 and its successors. The other technique, multi-threading, is also well understood. The novel approach proposed in this paper combines both vertical and horizontal micro-threading with vector instruction descriptors. It will be shown that a family of threads can represent a vector instruction with dependencies between the instances of that family, the iterations. This technique gives a very low overhead in implementing an n-way loop and is able to tolerate high memory latency. The use of micro-threading to handle dependencies between threads provides the ability to trade-off between instruction level parallelism and loop parallelism. The paper describes the means by which instruction classes may be instanced as independent parallel micro-threads and illustrates the speed-up that may be obtained compared to using a conventional loop.