Multithreaded Vector Architectures

Authors:
Roger Espasa;Mateo Valero
Affiliations:
-;-
Venue:
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Year:
1997

Citing 0
Cited 9

A performance study of out-of-order vector architectures and short registers

ICS '98 Proceedings of the 12th international conference on Supercomputing
Vector architectures: past, present and future

ICS '98 Proceedings of the 12th international conference on Supercomputing
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Exploiting Instruction- and Data-Level Parallelism

IEEE Micro
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The potential energy efficiency of vector acceleration

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Simultaneous branch and warp interweaving for sustained GPU performance

Proceedings of the 39th Annual International Symposium on Computer Architecture
SIMD divergence optimization through intra-warp compaction

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of this paper is to show that multithreading techniques can be applied to a vector processor to greatly increase processor throughput and maximize resource utilization. Using a trace driven approach, we simulate a selection of the Perfect Club and Specfp92 programs and compare their execution time on a conventional vector architecture with a single memory port and on a multithreaded vector architecture. We devote an important part of this paper to study the interaction between multithreading and main memory latency. This paper focuses on maximizing the usage of the memory port, the most expensive resource in typical vector computers. A study of the cost associated with the duplication of the vector register file is also carried out. Overall, multithreading provides for this architecture a performance advantage of more than a factor of 1.4 for realistic memory latencies, and can drive the utilization of the single memory port as high as 95%.