A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors

Authors:
Sally A. McKee;William A. Wulf
Affiliations:
-;-
Venue:
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Year:
1996

Citing 17
Cited 2

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Code generation for streaming: an access/execute mechanism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The Chinese remainder theorem and the prime memory system

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing memory bandwidth for vector computations

Proceedings of the international conference on Programming languages and system architectures
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Sunder: a programmable hardware prefetch architecture for numerical loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Blocking Linear Algebra Codes for Memory Hierarchies

Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
Access Ordering and Effective Memory Bandwidth

Access Ordering and Effective Memory Bandwidth
An Empirical Study of the Workload Distribution under Static Scheduling

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02

A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap severely impacts stream-orientated computations such as (de)compression, encryption, and scientific vector processing. This paper describes the development of an intelligent memory interface that can exploit compiler-provided information on streamed memory access patterns to improve memory bandwidth. Simulation results show that such shared-memory multiprocessor systems can deliver nearly the full attainable bandwidth with relatively modest hardware costs.