On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
A Simulation Study of the CRAY X-MP Memory System
IEEE Transactions on Computers
Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Accurate modelling of interconnection networks in vector supercomputers
ICS '91 Proceedings of the 5th international conference on Supercomputing
On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Base-p-cyclic reduction for tridiagonal systems of equations
Selected papers from the symposia on CWI-IMACS symposia on parallel scientific computing
Increasing the number of strides for conflict-free vector access
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Characterizing memory performance in vector multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
A micro-vectorprocessor architecture: performance modeling and benchmarking
ICS '93 Proceedings of the 7th international conference on Supercomputing
The performance of the cedar multistage switching network
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
Reducing Interference Among Vector Accesses in Interleaved Memories
IEEE Transactions on Computers
Buffered Banks in Multiprocessor Systems
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Memory access reordering in vector processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Co-design of interleaved memory systems
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Microprocessors & Microsystems
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 14.98 |
The performance of a vector processor accessing vectors placed in memory is strongly dependent on the conflicts produced in the memory subsystem. These conflicts delay the task of the functional units. There can be conflicts between elements of the same vector and between elements of different vector streams. It is known that the presence of the last kind of conflicts is the main cause of cycles lost. This paper proposes an order to access the elements of a vector stream that reduces the average memory access time in vector processors when several vector streams are concurrently accessed. The proposed order determines that the memory system observes the same stride for all the vector streams of a stride family. Conflicts between concurrent vector streams of the same family are completely eliminated if the rate at which memory modules are requested is less than or equal to their service rate. For other cases, the number of lost cycles due to conflicts is dramatically reduced.