Bounding on the gain of optimizing data layout in vector processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes
International Journal of Parallel Programming
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Bounding the gain of changing the number of memory modules in shared memory multiprocessors
Nordic Journal of Computing
Configurable parallel memory architecture for multimedia computers
Journal of Systems Architecture: the EUROMICRO Journal
Multiaccess Memory System for Attached SIMD Computer
IEEE Transactions on Computers
Array organization in parallel memories
International Journal of Parallel Programming
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
Parallel Memory Architecture for Application-Specific Instruction-Set Processors
Journal of Signal Processing Systems
Parallel memory architecture for TTA processor
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Hi-index | 0.01 |
The disparity between the processing speed and the data access rates presents a serious bottleneck in pipelined/vector processors. The memory bank conflict in interleaved system can be alleviated by skewing, for scientific computations performing functions on varieties of submatrices. So far uniskewing involving periodic and linear functions have been studied. Several difficulties encountered in such schemes are that they require a prime number of memory modules, may create wasted memory space, or addressing functions and the alignment network become complex. We present a new technique, termed multiskewing, which applies multiple functions on different sections of the array. Each of these functions may be as simple as a linear shift. We show that some of the advantages are that it does not require a prime number of memory, memory utilization factor is 100%, maintains the logical structure of the array, and allows optimal memory access of a large class of submatrices.