A study of a software cache implementation of the OpenMP memory model for multicore and manycore architectures

  • Authors:
  • Chen Chen;Joseph B. Manzano;Ge Gan;Guang R. Gao;Vivek Sarkar

  • Affiliations:
  • Tsinghua University, Beijing, P.R. China;University of Delaware, Newark, DE;University of Delaware, Newark, DE;University of Delaware, Newark, DE;Rice University, Houston, TX

  • Venue:
  • Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is motivated by the desire to provide an efficient and scalable software cache implementation of OpenMP on multicore and manycore architectures in general, and on the IBM CELL architecture in particular. In this paper, we propose an instantiation of the OpenMP memory model with the following advantages: (1) The proposed instantiation prohibits undefined values that may cause problems of safety, security, programming and debugging. (2) The proposed instantiation is scalable with respect to the number of threads because it does not rely on communication among threads or a centralized directory that maintains consistency of multiple copies of each shared variable. (3) The proposed instantiation avoids the ambiguity of the original memory model definition proposed on the OpenMP Specification 3.0. We also introduce a new cache protocol for this instantiation, which can be implemented as a software-controlled cache. Experimental results on the Cell Broadband Engine show that our instantiation results in nearly linear speedup with respect to the number of threads for a number of NAS Parallel Benchmarks. The results also show a clear advantage when comparing it to a software cache design derived from a stronger memory model that maintains a global total ordering among flush operations.