An Efficient Memory System for Image Processing
IEEE Transactions on Computers
High-performance computer architecture
High-performance computer architecture
On Linear Skewing Schemes and d-Ordered Vectors
IEEE Transactions on Computers
Scrambled storage for parallel memory systems
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
On access and alignment of data in a parallel processor
Information Processing Letters
Perfect Latin squares and parallel array access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An aperiodic storage scheme to reduce memory conflicts in vector processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
Efficient address generation in a parallel processor
Information Processing Letters
Design of an array processor for image processing
Journal of Parallel and Distributed Computing
Semi-linear and bi-base storage schemes classes: general overview and case study
ICS '95 Proceedings of the 9th international conference on Supercomputing
Configurable parallel memory architecture for multimedia computers
Journal of Systems Architecture: the EUROMICRO Journal
Memory access reordering in vector processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
Hi-index | 14.98 |
The problem of constructing an array processor with N processing elements, N memories, and an interconnection network which provides conflict-free access and alignment of various N-vectors including rows, columns, diagonals, contiguous blocks, and distributed blocks of N*N arrays, where N is any even power of two, is discussed. The use of linear skewing schemes offers no solution to this problem. The solution developed makes use of a nonlinear skewing scheme. The solution leads to a simple, efficient array processor architecture. In particular, the memory organization requires O(log N) gates to generate memory addresses for any of the N-vectors simultaneously in O(1) time. The interconnection structure is able to accomplish data alignment for any of the N-vectors with a single pass through a network of O(N log N) gates. As the system uses the minimum number of memories, it allows both processing elements and memories to achieve the highest utilization possible.