Algorithmic foundations for a parallel vector access memory system

Authors:
Binu K. Mathew;Sally A. McKee;John B. Carter;Al Davis
Affiliations:
Department of Computer Science, University of Utah, Salt Lake City, UT;Department of Computer Science, University of Utah, Salt Lake City, UT;Department of Computer Science, University of Utah, Salt Lake City, UT;Department of Computer Science, University of Utah, Salt Lake City, UT
Venue:
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Year:
2000

Citing 14
Cited 4

Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance of cached DRAM organizations in vector supercomputers

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Access ordering and effective memory bandwidth

Access ordering and effective memory bandwidth
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
Vector multiprocessors with arbitrated memory access

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Design and evaluation of dynamic access ordering hardware

ICS '96 Proceedings of the 10th international conference on Supercomputing
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Design of the 21174 memory controller for DIGITAL Personal Workstations

Digital Technical Journal
Correlated load-address predictors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Access order to avoid inter-vector-conflicts in complex memory systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Command Vector Memory Systems: High Performance at Low Cost

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Memory System Support for Image Processing

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

Leveraging cache coherence in active memory systems

ICS '02 Proceedings of the 16th international conference on Supercomputing
Memory scheduling for modern microprocessors

ACM Transactions on Computer Systems (TOCS)
High-bandwidth Address Generation Unit

Journal of Signal Processing Systems
Return data interleaving for multi-channel embedded CMPs systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. The Parallel Vector Access (PVA) unit exploits the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system. The PVA unit performs scatter/gather operations so that only the elements accessed by the application are transmitted across the system bus. Vector operations are broadcast in parallel to all memory banks, each of which implements an efficient algorithm to determine which vector elements it holds. Earlier performance evaluations have demonstrated that our PVA implementation loads elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting the performance of normal cache-line fills. Here we present the underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems.