The cache complexity of multithreaded cache oblivious algorithms

  • Authors:
  • Matteo Frigo;Volker Strumpen

  • Affiliations:
  • IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX

  • Venue:
  • Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n3/√Z + (Pn)1/3n2) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations, which incurs O(n2/Z+n+√Pn3+ε) cache misses with high probability.