On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
A Simulation Study of the CRAY X-MP Memory System
IEEE Transactions on Computers
Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Scrambled storage for parallel memory systems
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A dynamic storage scheme for conflict-free vector access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Perfect Latin squares and parallel array access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
IEEE Transactions on Computers
DFT/FFT and Convolution Algorithms: Theory and Implementation
DFT/FFT and Convolution Algorithms: Theory and Implementation
Accurate modelling of interconnection networks in vector supercomputers
ICS '91 Proceedings of the 5th international conference on Supercomputing
A novel cache design for vector processing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Increasing the number of strides for conflict-free vector access
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Conflict-free access of vectors with power-of-two strides
ICS '92 Proceedings of the 6th international conference on Supercomputing
On storage schemes for parallel array access
ICS '92 Proceedings of the 6th international conference on Supercomputing
Introducing a New Cache Design into Vector Computers
IEEE Transactions on Computers
Synchronized access to streams in SIMD vector multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
A Memory Interference Model for Regularly Patterned Multiple Stream Vector Accesses
IEEE Transactions on Parallel and Distributed Systems
Minimization of Memory and Network Contention for Accessing Arbitrary Data Patterns in SIMD Systems
IEEE Transactions on Computers
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Comparative Analysis of Cache Designs for Vector Processing
IEEE Transactions on Computers
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems
IEEE Transactions on Computers
Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes
International Journal of Parallel Programming
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Analytical Estimation of Vector Access Performance in Parallel Memory Architectures
IEEE Transactions on Computers
A Multiaccess Frame Buffer Architecture
IEEE Transactions on Computers
Conflict-Free Access for Streams in Multimodule Memories
IEEE Transactions on Computers
Configurable parallel memory architecture for multimedia computers
Journal of Systems Architecture: the EUROMICRO Journal
Multiaccess Memory System for Attached SIMD Computer
IEEE Transactions on Computers
Array organization in parallel memories
International Journal of Parallel Programming
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
High-bandwidth Address Generation Unit
Journal of Signal Processing Systems
Microprocessors & Microsystems
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
Proceedings of the 24th ACM International Conference on Supercomputing
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors
Journal of Signal Processing Systems
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 0.03 |
A discussion is presented of the use of dynamic storage schemes to improve parallelmemory performance during three important classes of data accesses: vector accesses inwhich multiple strides are used to access a single vector, block accesses, andconstant-geometry FFT accesses. The schemes investigated are based on linear addresstransformations, also known as XOR schemes. It has been shown that this class ofschemes can be implemented more efficiently in hardware and has more flexibility thanschemes based on row rotations or other techniques. Several analytical results areshown. These include: quantitative analysis of buffering effects in pipelined memorysystems; design rules for storage schemes that provide conflict-free access usingmultiple strides, blocks, and FFT access patterns; and an analysis of the effects ofmemory bank cycle time on storage scheme capabilities.