An Efficient Memory System for Image Processing
IEEE Transactions on Computers
Digital image processing (2nd ed.)
Digital image processing (2nd ed.)
A multiple-processor architecture for image processing
A multiple-processor architecture for image processing
On access and alignment of data in a parallel processor
Information Processing Letters
A dynamic storage scheme for conflict-free vector access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Perfect Latin squares and parallel array access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Parallel Algorithms for Hierarchical Clustering and Cluster Validity
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Organization in Parallel Computers
Data Organization in Parallel Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Memory organization schemes for large shared data
STACS'99 Proceedings of the 16th annual conference on Theoretical aspects of computer science
Hi-index | 0.00 |
In parallel matrix manipulation operations, some data patterns need to be accessed in one memory cycle without conflict. Investigating the frequently used data patterns, we propose a powerful skewing scheme which allows most frequently used data patterns of N elements, including rows, columns, diagonals, blocks with various shapes, points scattered over various blocks, and chessboards with various shapes, to be accessed in one memory cycle. We also propose simple methods to combine different skewing schemes into a single parallel storage system such that all the frequently used data patterns of N elements (the above patterns plus folded lines, two-pairs, and column-pairs) can be accessed in one memory cycle. The storage sytem uses N memory modules where N is any (even or odd) power of two. Address generation in the system need only exclusive-or operations and can be completed in constant time. The storage scheme allows different sized matrices to be processed efficiently on large scale systems by using the skewing scheme designed according to the size of the system, i.e. address generation mechanism is independent of the size of the matrices to be processed. Data alignment requirements (connecting each memory module to a proper processor) can be easily realized on a general-purpose interconnecting network, such as a hypercube.