Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
IEEE Transactions on Computers
Performance of cached DRAM organizations in vector supercomputers
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Comparative Analysis of Cache Designs for Vector Processing
IEEE Transactions on Computers
Communications of the ACM - Special issue on computer architecture
Architectural and application: the performance of the NEC SX-4 on the NCAR benchmark suite
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Cache performance in vector supercomputers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Conflict-Free Access for Streams in Multimodule Memories
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Scalable vector media-processors for embedded systems
Scalable vector media-processors for embedded systems
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
Proceedings of the 24th ACM International Conference on Supercomputing
Extending the cell SPE with energy efficient branch prediction
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
In this paper, we analyze the problem of supporting conflict-free access for multiple stride families in parallel memory schemes targeted for SIMD processing systems. We propose a Single-Affiliation Multiple-Stride (SAMS) scheme to support both unit-stride and strided conflict-free vector memory accesses. We compare our scheme against other previously proposed techniques using buffers and inter-vector out-of-order access. The main advantage of our proposal is that the atomic parallel access is supported without limiting the vector lengths. This provides better support when short vectors are considered. Our scheme also has the merit of better memory module resources utilization compared to the solutions with additional modules. Synthesis results for reconfigurable platform Virtex2-Pro FPGA indicate that the address translation of the SAMS scheme has efficient hardware implementation, which has a logic delay of less than 3 ns and trivial hardware resource utilization.