On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
Performance evaluation of vector accesses in parallel memories using a skewed storage scheme
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An aperiodic storage scheme to reduce memory conflicts in vector processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of vector access performance on skewed interleaved memory
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Conflict-free access of vectors with power-of-two strides
ICS '92 Proceedings of the 6th international conference on Supercomputing
Synchronized access to streams in SIMD vector multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Vector multiprocessors with arbitrated memory access
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Semi-linear and bi-base storage schemes classes: general overview and case study
ICS '95 Proceedings of the 9th international conference on Supercomputing
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems
IEEE Transactions on Computers
Algorithmic foundations for a parallel vector access memory system
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Increasing the effective bandwidth of complex memory systems in multivector processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Conflict-Free Access for Streams in Multimodule Memories
IEEE Transactions on Computers
A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Adaptive History-Based Memory Schedulers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A DRAM/SRAM Memory Scheme for Fast Packet Buffers
IEEE Transactions on Computers
Impulse: Memory system support for scientific applications
Scientific Programming
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Hi-index | 0.01 |
Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free vector access for some strides in vector processors with multi-module memories. In this paper, we extend these schemes to achieve this conflict-free access for a larger number of strides. The basic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector registers of the processor. Both matched and unmatched memories are considered: we show that the number of strides is even larger for the latter case. The hardware for address calculations and access control is described and shown to be of similar complexity as that required for access in order.