Performance of cached DRAM organizations in vector supercomputers

Authors:
W.-C. Hsu;J. E. Smith
Affiliations:
-;-
Venue:
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Year:
1993

Citing 2
Cited 13

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Software on the brink

IEEE Spectrum - Supercomputing

Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Algorithmic foundations for a parallel vector access memory system

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
High-Performance DRAMs in Workstation Environments

IEEE Transactions on Computers
Cache performance in vector supercomputers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Cached DRAM for ILP Processor Memory Access Latency Reduction

IEEE Micro
VL-CDRAM: variable line sized cached DRAMs

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

IEEE Transactions on Computers
Near-memory Caching for Improved Energy Consumption

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Near-Memory Caching for Improved Energy Consumption

IEEE Transactions on Computers
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

DRAMs containing cache memory are studied in the context of vector supercomputers. In particular, we consider systems where processors have no internal data caches and memory reference streams are generated by vector instructions. For this application, we expect that cached DRAMs can provide high bandwidth at relatively low cost.We study both DRAMs with a single, long cache line and with smaller, multiple cache lines. Memory interleaving schemes that increase data locality are proposed and studied. The interleaving schemes are also shown to lead to non-uniform bank accesses, i.e. hot banks. This suggest there is an important optimization problem involving methods that increase locality to improve performance, but not so much that hot banks diminish performance. We show that for uniprocessor systems, both types of cached DRAMs work well with the proposed interleave methods. For multiprogrammed multiprocessors, the multiple cache line DRAMs work better.