Stream Register Files with Indexed Access

Authors:
Nuwan Jayasena;Mattan Erez;Jung Ho Ahn;William J. Dally
Affiliations:
Stanford University;Stanford University;Stanford University;Stanford University
Venue:
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Year:
2004

Citing 0
Cited 15

Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Analysis and Performance Results of a Molecular Modeling Application on Merrimac

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
High-throughput sketch update on a low-power stream processor

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Proceedings of the 21st annual international conference on Supercomputing
Transform coding on programmable stream processors

The Journal of Supercomputing
Exploiting loop-dependent stream reuse for stream processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

ACM Transactions on Architecture and Code Optimization (TACO)
SARA: StreAm register allocation

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
The research of an embedded processor element for multimedia domain

MCAM'07 Proceedings of the 2007 international conference on Multimedia content analysis and mining
Xetal-Pro: an ultra-low energy and high throughput SIMD processor

Proceedings of the 47th Design Automation Conference
Exploiting the reuse supplied by loop-dependent stream references for stream processors

ACM Transactions on Architecture and Code Optimization (TACO)
Improving GPU performance via large warps and two-level warp scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A stream architecture supporting multiple stream execution models

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many current programmable architectures designed to exploit data parallelism require computation to be structured to operate on sequentially accessed vectors or streams of data. Applications with less regular data access patterns perform sub-optimally on such architectures. This paper presents a register file for streams (SRF) that allows arbitrary, indexed accesses. Compared to sequential SRF access, indexed access captures more temporal locality, reduces data replication in the SRF, and provides efficient support for certain types of complex access patterns. Our simulations show that indexed SRF access provides speedups of 1.03x to 4.1x and memory bandwidth reductions of up to 95% over sequential SRF access for a set of benchmarks representative of data-parallel applications with irregular accesses. Indexed SRF access also provides greater speedups than caches for a number of application classes despite significantly lower hardware costs. The area overhead of our indexed SRF implementation is 11%-22% over a sequentially accessed SRF, which corresponds to a modest 1.5%-3% increase in the total die area of a typical stream processor.