Alleviating Memory Contention in Matrix Computations on Large-Scale Shared-Memory Multiprocessors

  • Authors:
  • Ricardo Bianchini;Mark E. Crovella;Leonidas Kontothanassis;Thomas J. LeBlanc

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Alleviating Memory Contention in Matrix Computations on Large-Scale Shared-Memory Multiprocessors
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Memory contention can be a major source of overhead in large-scale shared-memory multiprocessors. Although there are many hardware solutions to the problem of memory contention, these solutions are often complex and expensive, so software solutions are an attractive alternative. This paper evaluates one particular software solution, called block-column allocation, which is very effective at reducing memory contention for a large class of SPMD (Single-Program-Multiple-Data) programs, and can be implemented easily by the compiler. We first quantify the impact of memory contention on performance by simulating the execution of several application kernels on a large-scale multiprocessor. Our simulation results confirm that memory contention is widespread on large-scale machines; our applications suggest that contention is usually caused by synchronized access to a range of addresses (rather than to a single address). We show that block-column allocation, where each range of addresses is divided into cache lines, and each cache line is allocated to a separate memory module, can nearly eliminate this source of memory contention. As our main contribution, we compare block-column allocation to row-major allocation (a common data allocation scheme) and logarithmic broadcasting (the standard software technique for alleviating memory contention). Our analysis demonstrates the clear superiority of block-column allocation over row-major allocation in the presence of memory contention. Our analysis also indicates that the choice between block-column allocation and logarithmic broadcasting is less clear, as it depends both on the type of synchronization used and the number of processors. We can conclude, however, that on large-scale machines with hundreds of processors, block-column allocation and lock-based synchronization is the most effective combination for reducing memory contention in SPMD matrix computations.