Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors

  • Authors:
  • Ravi Iyer;Laxmi Narayan Bhuyan

  • Affiliations:
  • -;-

  • Venue:
  • HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cache coherent non-uniform memory access (CC-NUMA) multiprocessors continue to suffer from remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache. The main idea is to implement small fast caches in crossbar switches of the interconnect medium to capture and store shared data as they flow from the memory module to the requesting processor. This stored data acts as a cache for subsequent requests, thus reducing the latency of remote memory accesses tremendously. The implementation of a cache in a crossbar switch needs to be efficient and robust, yet flexible for changes in the caching protocol. The design and implementation details of a CAche Embedded Switch ARchitecture, CAESAR, using wormhole routing with virtual channels is presented. Using detailed execution-driven simulations, we find that the CAESAR switch cache is capable of improving the performance of CC-NUMA multiprocessors by reducing the number of reads served at distant remote memories by up to 45% and improving the application execution time by as high as 20%. We conclude that the switch caches provide a cost-effective solution for designing high performance CC-NUMA multiprocessors.