Efficient synchronization for nonuniform communication architectures

  • Authors:
  • Zoran Radović;Erik Hagersten

  • Affiliations:
  • Uppsala University, Uppsala, Sweden;Uppsala University, Uppsala, Sweden

  • Venue:
  • Proceedings of the 2002 ACM/IEEE conference on Supercomputing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scalable parallel computers are often nonuniform communication architectures (NUCAs), where the access time to other processor's caches vary with their physical location. Still, few attempts of exploring cache-to-cache communication locality have been made. This paper introduces a new kind of synchronization primitives (lock-unlock) that favor neighboring processors when a lock is released. This improves the lock handover time as well as access time to the shared data of the critical region.A critical section guarded by our new RH lock takes less than half the time to execute compared with the same critical section guarded by any other lock on our NUCA hardware. The execution time for Raytrace with 28 processors was improved 2.23--4.68 times, while global traffic was dramatically decreased compared with all the other locks. The average execution time was improved 7--24% while the global traffic was decreased 8-28% for an average over the seven applications studied.