Balancing Locality and Parallelism on Shared-cache Mulit-core Systems

  • Authors:
  • Michael Jason Cade;Apan Qasem

  • Affiliations:
  • -;-

  • Venue:
  • HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emergence of multi-core systems opens new opportunities for thread-level parallelism and dramatically increases the performance potential of applications running on these systems. However, the state of the art in performance enhancing software is far from adequate in regards to the exploitation of hardware features on this complex new architecture. As a result, much of the performance capabilities of multi-core systems are yet to be realized. This research addresses one facet of this problem by exploring the relationship between data-locality and parallelism in the context of multi-core architectures where one or more levels of cache are shared among the different cores. A model is presented for determining a profitable synchronization interval for concurrent threads that interact in a producer-consumer relationship. Experimental results suggest that consideration of the synchronization window, or the amount of work individual threads can be allowed to do between synchronizations, allows for parallelism- and locality-aware performance optimizations. The optimum synchronization window is a function of the number of threads, data reuse patterns within the workload, and the size and configuration of the last-level of cache that is shared among processing units. By considering these factors, the calculation of the optimum synchronization window incorporates parallelism and data locality issues for maximum performance.