Effectively sharing a cache among threads

  • Authors:
  • Guy E. Blelloch;Phillip B. Gibbons

  • Affiliations:
  • Carnegie Mellon University;Intel Research Pittsburgh

  • Venue:
  • Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We compare the number of cache misses M1 for running a computation on a single processor with cache size C1 to the total number of misses Mp for the same computation when using p processors or threads and a shared cache of size Cp. We show that for any computation, and with an appropriate (greedy) parallel schedule, if Cp ≥ C1 + pd then Mp ≤ M1. The depth d of the computation is the length of the critical path of dependences. This gives the perhaps surprising result that for sufficiently parallel computations the shared cache need only be an additive size larger than the single-processor cache, and gives some theoretical justification for designing machines with shared caches.We model a computation as a DAG and the sequential execution as a depth first schedule of the DAG. The parallel schedule we study is a parallel depth-first schedule (PDF schedule) based on the sequential one. The schedule is greedy and therefore work-efficient. Our main results assume the Ideal Cache model, but we also present results for other more realistic cache models.