Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture

Authors:
Joseph Gebis;Leonid Oliker;John Shalf;Samuel Williams;Katherine Yelick
Affiliations:
Lawrence Berkeley National Laboratory, CRD/NERSC, Berkeley CA 94720 and CS Division, University of California at Berkeley, Berkeley CA 94720;Lawrence Berkeley National Laboratory, CRD/NERSC, Berkeley CA 94720 and CS Division, University of California at Berkeley, Berkeley CA 94720;Lawrence Berkeley National Laboratory, CRD/NERSC, Berkeley CA 94720;Lawrence Berkeley National Laboratory, CRD/NERSC, Berkeley CA 94720 and CS Division, University of California at Berkeley, Berkeley CA 94720;Lawrence Berkeley National Laboratory, CRD/NERSC, Berkeley CA 94720 and CS Division, University of California at Berkeley, Berkeley CA 94720
Venue:
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Year:
2009

Citing 11
Cited 0

Characterizing the behavior of sparse algorithms on caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Vector architectures: past, present and future

ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory Architecture Exploration for Programmable Embedded Systems

Memory Architecture Exploration for Programmable Embedded Systems
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Microprocessor pipeline energy analysis

Proceedings of the 2003 international symposium on Low power electronics and design
Latency lags bandwith

Communications of the ACM - Voting systems
Energy Characterization of Hardware-Based Data Prefetching

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Chip multiprocessing and the cell broadband engine

Proceedings of the 3rd conference on Computing frontiers
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Low-complexity vector microprocessor extension

Low-complexity vector microprocessor extension

Quantified Score

Hi-index	0.00

Visualization

Abstract

The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changes to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication -- achieving 2x---13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.