Scalable directory architecture for distributed shared memory chip multiprocessors

Authors:
Huan Fang;Mats Brorsson
Affiliations:
KTH School of Information and Communication Technology;KTH School of Information and Communication Technology
Venue:
ACM SIGARCH Computer Architecture News
Year:
2009

Citing 7
Cited 0

Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration

IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
An efficient cache design for scalable glueless shared-memory multiprocessors

Proceedings of the 3rd conference on Computing frontiers
A novel lightweight directory architecture for scalable shared-memory multiprocessors

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main memory, we consider distributing the directory together with L2 cache across all nodes on a Chip Multiprocessor. Each node contains a processing unit, a private L1 cache, a slice of L2 cache, memory controller and a router. Both L2 cache and memories are distributed shared and interleaved by a subset of memory address bits. All nodes are interconnected through a low latency two dimensional Mesh network. Directory, being a split component to L2 cache, only stores sharing information for blocks while L2 cache stores only data blocks exclusive with L1 cache. Shared L2 cache can increase total effective cache capacity on chip, but also increase the miss latency when data is on a remote node. Being different from Directory Cache structure, our proposal totally removes the directory from memory, which saves memory space and reduces access latency. Compared to L2 cache that combines directory information internally, our L2 cache structure saves up to 88% cache space and achieves similar performance.