Conflict-Free Vector Access Using a Dynamic Storage Scheme

Authors:
David T. Harper, III;Darel A. Linebarger
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1991

Citing 6
Cited 20

On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
Transforming FORTRAN DO loops to improve performance on vector architectures

ACM Transactions on Mathematical Software (TOMS)
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
On Linear Skewing Schemes and d-Ordered Vectors

IEEE Transactions on Computers
Vector access performance in parallel memories using skewed storage scheme

IEEE Transactions on Computers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing

Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Conflict-free access of vectors with power-of-two strides

ICS '92 Proceedings of the 6th international conference on Supercomputing
A case for Wafer-scale interconnected memory arrays

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Scalable parallel memory architecture with a skew scheme

ICS '93 Proceedings of the 7th international conference on Supercomputing
Synchronized access to streams in SIMD vector multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Reducing inter-vector-conflicts in complex memory systems

ICS '96 Proceedings of the 10th international conference on Supercomputing
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems

IEEE Transactions on Computers
Increasing the effective bandwidth of complex memory systems in multivector processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Conflict-Free Access for Streams in Multimodule Memories

IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Configurable parallel memory architecture for multimedia computers

Journal of Systems Architecture: the EUROMICRO Journal
Memory access reordering in vector processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Multiaccess Memory System for Attached SIMD Computer

IEEE Transactions on Computers
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Configurable data memory for multimedia processing

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Memory organization with multi-pattern parallel accesses

Proceedings of the conference on Design, automation and test in Europe
High-bandwidth Address Generation Unit

Journal of Signal Processing Systems
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance

Proceedings of the 24th ACM International Conference on Supercomputing
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors

Journal of Signal Processing Systems
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts

Proceedings of the 8th ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	14.99

Visualization

Abstract

An approach whereby conflict-free access of any constant stride can be made by selecting a storage scheme for each vector based on the accessing patterns used with that vector is considered. By factoring the stride into two components, one a power of 2 and the other relatively prime to 2, a storage scheme that allows conflict-free access to the vector using the specified stride can be synthesized. All such schemes are based on a variation of the row rotation mechanism proposed by P. Budnik and D. Kuck. Each storage scheme is based on two parameters, one describing the type of rotation to perform and the other describing the amount of memory to be rotated as a single block. The performance of the memory under access strides other than the stride used to specify the storage scheme is also considered. Modeling these other strides represents a vector being accessed with multiple strides as well as situations when the stride cannot be determined prior to initializing the vector. Simulation results show that if a single buffer is added to each memory port, then the average performance of the dynamic scheme surpasses that of the interleaved scheme for arbitrary stride accesses.