A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors

Authors:
Manuel E. Acacio;José González;José M. García;José Duato
Affiliations:
-;-;-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 10
Cited 7

The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A New Scalable Directory Architecture for Large-Scale Multiprocessors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration

IEEE Transactions on Parallel and Distributed Systems
Evaluating IA-32 web servers through simics: a practical experience

Journal of Systems Architecture: the EUROMICRO Journal
Proximity-aware directory-based coherence for multi-core processor architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An adaptive cache coherence protocol for chip multiprocessors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Cache management for discrete processor architectures

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware and the network interface/router. In this work we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel threelevel directory architecture and adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them is also presented. Using execution-driven simulations, we show significant L2 miss latency reductions (more than 60% in some cases). These important improvements translate into reductions of more than 30% in the application execution time in some cases.