An efficient routing control for the SIGMA network Σ(4)
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Performance evaluation of vector accesses in parallel memories using a skewed storage scheme
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A new interconnection network for SIMD computers: the sigma networks
IEEE Transactions on Computers
Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Odd memory systems may be quite interesting
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Scalable parallel memory architecture with a skew scheme
ICS '93 Proceedings of the 7th international conference on Supercomputing
Synchronized access to streams in SIMD vector multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Vector multiprocessors with arbitrated memory access
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Semi-linear and bi-base storage schemes classes: general overview and case study
ICS '95 Proceedings of the 9th international conference on Supercomputing
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Increasing the effective bandwidth of complex memory systems in multivector processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Conflict-Free Accesses to Strided Vectors on a Banked Cache
IEEE Transactions on Computers
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
High-bandwidth Address Generation Unit
Journal of Signal Processing Systems
PPT: joint performance/power/thermal management of DRAM memory for multi-core systems
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Hi-index | 0.00 |
On many commercial supercomputers, several vector register processors share a global highly interleaved memory in a MIMD mode. When all the processors are working on a single vector loop, a significant part of the potential memory throughput may be wasted due to the asynchronism of the processors.In order to limit this loss of memory throughput, a SIMD synchronization mode for vector accesses to memory may be used. But an important part of the memory bandwidth may be wasted when accessing vectors with an even stride.In this paper, we present IPS, an interleaved parallel scheme, which ensures an equitable distribution of elements on a highly interleaved memory for a wide range a vector strides. We show how to organize access to memory, such that unscrambling of vectors from memory to the vector register processors requires a minimum number of passes through the interconnection network.