Provably good multicore cache performance for divide-and-conquer algorithms

  • Authors:
  • Guy E. Blelloch;Rezaul A. Chowdhury;Phillip B. Gibbons;Vijaya Ramachandran;Shimin Chen;Michael Kozuch

  • Affiliations:
  • Carnegie Mellon University;University of Texas, Austin;Intel Research Pittsburgh;University of Texas, Austin;Intel Research Pittsburgh;Intel Research Pittsburgh

  • Venue:
  • Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-and-conquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L1 and L2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptotically-optimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.