A performance study of out-of-order vector architectures and short registers
ICS '98 Proceedings of the 12th international conference on Supercomputing
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Simulation Study of Decoupled Vector Architectures
The Journal of Supercomputing
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The potential energy efficiency of vector acceleration
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ALP: Efficient support for all levels of parallelism for complex media applications
ACM Transactions on Architecture and Code Optimization (TACO)
Simultaneous branch and warp interweaving for sustained GPU performance
Proceedings of the 39th Annual International Symposium on Computer Architecture
SIMD divergence optimization through intra-warp compaction
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
The purpose of this paper is to show that multithreading techniques can be applied to a vector processor to greatly increase processor throughput and maximize resource utilization. Using a trace driven approach, we simulate a selection of the Perfect Club and Specfp92 programs and compare their execution time on a conventional vector architecture with a single memory port and on a multithreaded vector architecture. We devote an important part of this paper to study the interaction between multithreading and main memory latency. This paper focuses on maximizing the usage of the memory port, the most expensive resource in typical vector computers. A study of the cost associated with the duplication of the vector register file is also carried out. Overall, multithreading provides for this architecture a performance advantage of more than a factor of 1.4 for realistic memory latencies, and can drive the utilization of the single memory port as high as 95%.