Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

Authors:
Ricardo Bianchini;Thomas J. LeBlanc
Affiliations:
University of Rochester, USA;University of Rochester, USA
Venue:
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Year:
1994

Citing 7
Cited 0

Multiprocessor cache design considerations

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The effect of sharing on the cache and bus performance of parallel programs

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The effects of block size on the performance of coherent caches in shared-memory multiprocessors

The effects of block size on the performance of coherent caches in shared-memory multiprocessors
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The performance impact of block sizes and fetch strategies

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important architectural design decision affecting the performance of coherent caches is the choice of block size. There are two primary factors that influence this choice: the reference behavior of applications and the remote access bandwidth and latency of the machine. Given that we anticipate increases in both network bandwidth and latency (in processor cycles) in scalable shared-memory multiprocessors, the question arises as to what effect these increases will have on the choice of block size. We use analytical modeling and execution-driven simulation of parallel programs on a large-scale shared-memory machine to examine the relationship between cache block size and application performance as a function of remote access bandwidth and latency. We show that even under assumptions of high remote access bandwidth and latency, the best application performance usually results from using cache blocks between S2 and 128 bytes in size. We also show that modifying the program to remove the dominant source of misses may not increase the best performing block size. We conclude that large cache blocks cannot be justified in most realistic scenarios.