Vector multiprocessors with arbitrated memory access

Authors:
Montse Peiron;Mateo Valero;Eduard Ayguadé;Tomás Lang
Affiliations:
Department d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, c/ Gran Capità s/n, Mòdul D6, 08071 - Barcelona, Spain;Department d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, c/ Gran Capità s/n, Mòdul D6, 08071 - Barcelona, Spain;Department d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, c/ Gran Capità s/n, Mòdul D6, 08071 - Barcelona, Spain;Department of Electrical and Computer Engineering, University of California at Irvine
Venue:
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Year:
1995

Citing 15
Cited 6

On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
A Simulation Study of the CRAY X-MP Memory System

IEEE Transactions on Computers
Modelling, measurement, and simulation of memory interference in the CRAY X-MP

Parallel Computing
Performance evaluation of vector accesses in parallel memories using a skewed storage scheme

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Vector Computer Memory Bank Contention

IEEE Transactions on Computers
Some results in memory conflict analysis

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Accurate modelling of interconnection networks in vector supercomputers

ICS '91 Proceedings of the 5th international conference on Supercomputing
A conflict-free memory design for multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Measurement of memory access contentions in multiple vector processor systems

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Interleaved parallel schemes: improving memory throughput on supercomputers

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Characterizing memory performance in vector multiprocessors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Synchronized access to streams in SIMD vector multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Access conflicts in multiprocessor memories queueing models and simulation studies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Memory Access Synchronization in Vector Multiprocessors

CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing

Algorithmic foundations for a parallel vector access memory system

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Increasing the effective bandwidth of complex memory systems in multivector processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Conflict-Free Accesses to Strided Vectors on a Banked Cache

IEEE Transactions on Computers
Memory scheduling for modern microprocessors

ACM Transactions on Computer Systems (TOCS)
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Return data interleaving for multi-channel embedded CMPs systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The high latency of memory accesses is one of the factors that most contribute to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in the case of multiprocessors make that the execution time of applications increases significantly. In this work we propose a memory access method that for both cases of vector uniprocessors and multiprocessors allows to perform stream accesses with the smallest possible latency in the majority of the cases. The basic idea is to arbitrate the memory access by defining the order in which the memory modules are visited. The stream elements are requested out of order. In addition, the access method also reduces the cost of the interconnection network.