On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
A Simulation Study of the CRAY X-MP Memory System
IEEE Transactions on Computers
Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Scrambled storage for parallel memory systems
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A dynamic storage scheme for conflict-free vector access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Semi-linear and bi-base storage schemes classes: general overview and case study
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes
International Journal of Parallel Programming
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Configurable parallel memory architecture for multimedia computers
Journal of Systems Architecture: the EUROMICRO Journal
Multiaccess Memory System for Attached SIMD Computer
IEEE Transactions on Computers
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
High-bandwidth Address Generation Unit
Journal of Signal Processing Systems
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
Proceedings of the 24th ACM International Conference on Supercomputing
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 14.99 |
A technique to analyze transformation matrices is presented. This technique is based on decomposing complex transformations into elementary transformations. When combined with a factorization of the access stride into two components, one a power of 2 and the other relatively prime to 2, the technique leads to an algorithmic synthesis of a CF (conflict free) storage scheme. Additionally, because the address to storage location mapping arithmetic is performed modulo 2, the time required to transform an address to its corresponding storage location is smaller and the hardware cost is lower than if schemes based on row rotation were used.