Software Barrier Performance on Dual Quad-Core Opterons

  • Authors:
  • Jie Chen;William Watson III

  • Affiliations:
  • -;-

  • Venue:
  • NAS '08 Proceedings of the 2008 International Conference on Networking, Architecture, and Storage
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-core processors based SMP servers have become building blocks for Linux clusters in recent years because they can deliver better performance for multi-threaded programs through on-chip multi-threading. However, a relative slow software barrier can hinder the performance of a data-parallel scientific application on a multi-core system. In this paper we study the performance of different software barrier algorithms on a server based on newly introduced AMD quad-core Opteron processors. We study how the memory architecture and the cache coherence protocol of the system influence the performance of barrier algorithms. We present an optimized barrier algorithm derived from the queue-based barrier algorithm. We find that the optimized barrier algorithm achieves speedup of 1.77 over the original queue-based algorithm. In addition, it has speedup of 2.39 over the software barrier generated by the Intel OpenMP compiler.