The interaction of parallel programming constructs and coherence protocols

  • Authors:
  • Ricardo Bianchini;Enrique V. Carrera;Leonidas Kontothanassis

  • Affiliations:
  • COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Cambridge Research Laboratory, Digital Equipment Corporation, Cambridge, MA

  • Venue:
  • PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Some of the most common parallel programming idioms include locks, barriers, and reduction operations. The interaction of these programming idioms with the multiprocessor's coherence protocol has a significant impact on performance. In addition, the advent of machines that support multiple coherence protocols prompts the question of how to best implement such parallel constructs, i.e. what combination of implementation and coherence protocol yields the best performance. In this paper we study the running time and communication behavior of (1) centralized (ticket) and MCS spin locks, (2) centralized, dissemination, and tree-based barriers, and (3) parallel and sequential reductions, under pure and competitive update coherence protocols; results for write-invalidate protocol are presented mostly for comparison purposes. Our experiments indicate that parallel programming techniques that are well-established for write invalidate protocols, such as MCS locks and parallel reductions, are often inappropriate for update-based protocols. In contrast, techniques such as dissemination and tree barriers achieve superior performance under update-based protocols. Our results also show that the implementation of parallel programming idioms must take the coherence protocol into account, since update-based protocols often lead to different design decisions than write invalidate protocols. Our main conclusion is that protocol-conscious implementation of parallel programming structures can significantly improve application performance; for multiprocessors that can support more than one coherence protocol both the protocol and implementation should betaken into account when exploiting parallel constructs.