Low-complexity vector microprocessor extension

Authors:
David A. Patterson;Joseph James Gebis
Affiliations:
University of California, Berkeley;University of California, Berkeley
Venue:
Low-complexity vector microprocessor extension
Year:
2008

Citing 0
Cited 3

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Mat-core: a decoupled matrix core extension for general-purpose processors

Neural, Parallel & Scientific Computations
Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the last few years, single-thread performance has been improving at a snail’s pace. Power limitations, increasing relative memory latency, and the exhaustion of improvement in instruction-level parallelism are forcing microprocessor architects to examine new processor design strategies. In this dissertation, I take a look at a technology that can improve the efficiency of modern microprocessors: vectors. Vectors are a simple, power-efficient way to take advantage of common data-level parallelism in an extensible, easily-programmable manner. My work focuses on the process of transitioning from traditional scalar microprocessors to computers that can take advantage of vectors. First, I describe a process for extending existing single-instruction, multiple-data instruction sets to support full vector processing, in a way that remains binary compatible with existing applications. Initial implementations can be low cost, but be transparently extended to higher performance later. I also describe ViVA, the Virtual Vector Architecture. ViVA adds vector-style memory operations to existing microprocessors but does not include arithmetic datapaths; instead, memory instructions work with a new buffer placed between the core and second-level cache. ViVA serves as a low-cost solution to getting much of the performance of full vector memory hierarchies while avoiding the complexity of adding a full vector system. Finally, I test the performance of ViVA by modifying a cycle-accurate full-system simulator to support ViVA’s operation. After extensive calibration, I test the basic performance of ViVA using a series of microbenchmarks. I compare the performance of a variety of ViVA configurations for corner turn, used in processing multidimensional data, and sparse matrix-vector multiplication, used in many scientific applications. Results show that ViVA can give significant benefit for a variety of memory access patterns, without relying on a costly hardware prefetcher.