Conflict-Free Access for Streams in Multimodule Memories

Authors:
Tomás Lang;Mateo Valero;Montse Peiron;Eduard Ayguadé
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Computers
Year:
1995

Citing 10
Cited 17

On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
Performance evaluation of vector accesses in parallel memories using a skewed storage scheme

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An aperiodic storage scheme to reduce memory conflicts in vector processors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of vector access performance on skewed interleaved memory

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Conflict-Free Vector Access Using a Dynamic Storage Scheme

IEEE Transactions on Computers
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Conflict-free access of vectors with power-of-two strides

ICS '92 Proceedings of the 6th international conference on Supercomputing
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems

Synchronized access to streams in SIMD vector multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Bounding on the gain of optimizing data layout in vector processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Bounding the gain of changing the number of memory modules in shared memory multiprocessors

Nordic Journal of Computing
Configurable parallel memory architecture for multimedia computers

Journal of Systems Architecture: the EUROMICRO Journal
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A DRAM/SRAM Memory Scheme for Fast Packet Buffers

IEEE Transactions on Computers
Virtually Pipelined Network Memory

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Configurable data memory for multimedia processing

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Memory organization with multi-pattern parallel accesses

Proceedings of the conference on Design, automation and test in Europe
High-bandwidth Address Generation Unit

Journal of Signal Processing Systems
High-bandwidth network memory system through virtual pipelines

IEEE/ACM Transactions on Networking (TON)
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance

Proceedings of the 24th ACM International Conference on Supercomputing
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors

Journal of Signal Processing Systems
Scalable QoS-aware memory controller for high-bandwidth packet memory

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts

Proceedings of the 8th ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	15.00

Visualization

Abstract

Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we extend these schemes to achieve this conflict-free access for a larger number of strides. The basic idea is to perform an out-of-order access to a stream of fixed length. This stream is then stored in a local memory and used in subsequent instructions. This mode of operation is suitable for vector processors and for processors with decoupled access. The scheme and mode of operation proposed produce the largest possible number of conflict-free strides. Memory systems with any ratio between the number of memory modules and memory latency are considered. The hardware for address calculations and access control is described and shown to be of similar complexity as that required for access in order.