Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks

  • Authors:
  • Cristina Hristea;Daniel Lenoski;John Keen

  • Affiliations:
  • Massachusetts Institute of Technology, Cambridge, MA;Silicon Graphics, Inc.;Silicon Graphics, Inc.

  • Venue:
  • SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Even with today's large caches, the increasing performance gap between processors and memory systems imposes a memory bottleneck for many important scientific and commercial applications. This bottleneck is intensified in shared-memory multiprocessors by contention and the effects of cache coherency. Under heavy memory contention, the memory latency may increase 2 or 3 times. Nonethless, as more sophisticated techniques are used to hide latency and increase bandwidth, measuring memory performance has become increasingly difficult. Previous simple methods to measure memory performance can overestimate uniprocessor memory latency and underestimate bandwidth by tens of percent. This paper introduces a micro benchmark suite that measures memory hierarchy performance in light of both uniprocessor optimizations and the contention and coherence effects of multiprocessors. The benchmark suite has been used to improve the memory system performance of the SGI Origin multiprocessor.