Boosting the Performance of Shared Memory Multiprocessors

Authors:
Per Stenström;Mats Brorsson;Fredrik Dahlgren;Håkan Grahn;Michel Dubois
Affiliations:
-;-;-;-;-
Venue:
Computer
Year:
1997

Citing 12
Cited 4

Memory Access Dependencies in Shared-Memory Multiprocessors

IEEE Transactions on Software Engineering
The Stanford Dash Multiprocessor

Computer
DDM: A Cache-Only Memory Architecture

Computer
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Using write caches to improve performance of cache coherence protocols in shared-memory multiprocessors

Journal of Parallel and Distributed Computing
Implementation and evaluation of update-based cache protocols under relaxed memory consistency models

Future Generation Computer Systems
Essential misses and data traffic in coherence protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Two Adaptive Hybrid Cache Coherency Protocols

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture

Trends in Shared Memory Multiprocessing

Computer
An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration

IEEE Transactions on Parallel and Distributed Systems
A holistic approach to computer system design education based on system simulation techniques

WCAE '98 Proceedings of the 1998 workshop on Computer architecture education
Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

Quantified Score

Hi-index	4.10

Visualization

Abstract

Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging class of shared memory multiprocessors are nonuniform memory access machines with private caches and a cache coherence protocol. Proposed hardware optimizations to CC-NUMA machines can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each of four proposed optimizations: release consistency, adaptive sequential prefetching, migratory sharing detection, and hybrid update/invalidate with a write cache. The four optimizations differ with respect to which application features they attack, what hardware resources they require, and what constraints they impose on the application software. The authors measured the degree of performance improvement using the four optimizations in isolation and in combination, looking at the trade-offs in hardware and programming complexities. Although one combination of the proposed optimizations (prefetching and migratory sharing detection) can boost a sequentially consistent machine to perform as well as a machine with release consistency, release consistency models offer significant performance improvements across a broad application domain at little extra complexity in the machine design. Moreover, a combination of sequential prefetching and hybrid update/invalidate with a write cache cuts the execution time of a sequentially consistent machine by half with fairly modest changes to the second-level cache and the cache protocol. The authors expect that designers will begin to turn more to the release consistency model.