The impact of shared-cache clustering in small-scale shared-memory multiprocessors

  • Authors:
  • B. A. Nayfeh;K. Olukotun;J. P. Singh

  • Affiliations:
  • -;-;-

  • Venue:
  • HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

As processor performance continues to increase, greater demands are placed on the bus and memory systems of small-scale shared-memory multiprocessors. In this paper, we investigate how to reduce these demands by organizing groups of processors into clusters which are then connected together using a shared global bus. We take advantage of the high-bandwidth, low-latency interconnections available from multichip module (MCM) technology, to build clusters with multiple high-performance processors sharing an L2 cache. The use of MCM technology allows for significantly lower shared-cache access times, and higher shared cache to processor bandwidth, than is possible using printed circuit board (PCB) designs. Our results show that for an eight processor bus-based system, bus contention can be a large portion of the overall execution time, and that clustering can eliminate much or all of it. Clustering also tends to reduce read stall times due to shared working set effects and a reduction in the effect of communication misses. The same is true for two and four processor systems, although to a lesser extent. Overall, we find that clustering can result in significant performance gains for applications which heavily utilize the memory system.