An application of number theory to the organization of raster-graphics memory
Journal of the ACM (JACM) - The MIT Press scientific computation series
Performance evaluation of vector accesses in parallel memories using a skewed storage scheme
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A new interconnection network for SIMD computers: the sigma networks
IEEE Transactions on Computers
Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A ultra fast Euclidean division algorithm for prime memory systems
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Interleaved parallel schemes: improving memory throughput on supercomputers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Semi-linear and bi-base storage schemes classes: general overview and case study
ICS '95 Proceedings of the 9th international conference on Supercomputing
Nonprime Memory Systems and Error Correction in Address Translation
IEEE Transactions on Computers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Hi-index | 0.01 |
Using a prime number of N of memory banks on a vector processor allows a conflict-free access for any slice of N consecutive elements of a vector stored with a stride not multiple of N.To reject the use of a prime (or odd) number N of memory banks, it is generally advanced that address computation for such a memory system would require systematic Euclidean Division by the number N. We first show that the well known Chinese Remainder Theorem allows to define a very simple mapping of data onto the memory banks for which address computation does not require any Euclidean Division.Massively parallel SIMD computers may have several thousands of processors. When the memory on such a machine is globally shared, routing vectors from memory to the processors is a major difficulty; the control for the interconnection network cannot be generally computed at execution time. When the number of memory banks and processors is a product of prime numbers, the family of permutations needed for routing vectors for memory to the processors through the interconnection network have very specific properties. The Chinese Remainder Network presented in the paper is able to execute all these permutations in a single path and may be self-routed.