On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
Performance evaluation of vector accesses in parallel memories using a skewed storage scheme
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The Prime Memory System for Array Access
IEEE Transactions on Computers
Theoretical Limitations on the Efficient Use of Parallel Memories
IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
Compile-time techniques for efficient utilization of parallel memories
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
A dynamic storage scheme for conflict-free vector access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Perfect Latin squares and parallel array access
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An aperiodic storage scheme to reduce memory conflicts in vector processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of vector access performance on skewed interleaved memory
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A conflict-free memory design for multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A ultra fast Euclidean division algorithm for prime memory systems
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Measurement of memory access contentions in multiple vector processor systems
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
IEEE Transactions on Computers
Interleaved parallel schemes: improving memory throughput on supercomputers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Odd memory systems may be quite interesting
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ICS '94 Proceedings of the 8th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
A Memory Interference Model for Regularly Patterned Multiple Stream Vector Accesses
IEEE Transactions on Parallel and Distributed Systems
OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses
ICS '90 Proceedings of the 4th international conference on Supercomputing
Fault-Tolerant Interleaved Memory Systems with Two-Level Redundancy
IEEE Transactions on Computers
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Bounding on the gain of optimizing data layout in vector processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems
IEEE Transactions on Computers
Increasing the effective bandwidth of complex memory systems in multivector processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
Reducing Interference Among Vector Accesses in Interleaved Memories
IEEE Transactions on Computers
Analytical Estimation of Vector Access Performance in Parallel Memory Architectures
IEEE Transactions on Computers
Buffered Banks in Multiprocessor Systems
IEEE Transactions on Computers
A 3D Skewing and De-skewing Scheme for Conflict-Free Access to Rays in Volume Rendering
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories
IEEE Transactions on Parallel and Distributed Systems
A Novel Sequencer Hardware for Application Specific Computing
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Multiaccess Memory System for Attached SIMD Computer
IEEE Transactions on Computers
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing
IEEE Transactions on Computers
Conflict-Free Accesses to Strided Vectors on a Banked Cache
IEEE Transactions on Computers
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
A programmable, scalable-throughput interleaver
EURASIP Journal on Wireless Communications and Networking
Hi-index | 15.02 |
The degree to which high-speed vector processors approach their peak performance levels is closely tied to the amount of interference they encounter while accessing vectors in memory. In this paper we present an evaluation of a storage scheme that reduces the average memory access time in a vector-oriented architecture. A skewing scheme is used to map vector components into parallel memory modules such that, for most vector access patterns, the number of memory conflicts is reduced over that observed in interleaved parallel memory systems. Address and data buffers are used locally in each module so that transient nonuniformities which occur in some access patterns do not degrade performance. Previous investigations into skewing techniques have attempted to provide conflict-free access for a limited subset of access patterns. The goal of this investigation is different. The skewing scheme evaluated here does not eliminate all memory conflicts but it does improve the average performance of vector access over interleaved systems for a wide range of strides. It is shown that little extra hardware is required to implement the skewing scheme. Also, far fewer restrictions are placed on the number of memory modules in the system than are present in other proposed schemes.