Scalability of Gaussian 03 on SGI Altix: The Importance of Data Locality on CC-NUMA Architecture

  • Authors:
  • Roberto Gomperts;Michael Frisch;Jean-Pierre Panziera

  • Affiliations:
  • Silicon Graphics, Inc., Sunnyvale, USA CA 94085;Gaussian, Inc., Wallingford, USA CT 06492;Silicon Graphics, Inc., Sunnyvale, USA CA 94085

  • Venue:
  • IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Performance anomalies when running Gaussian frequency calculations in parallel on SGI Altix computers with CC-NUMA memory architecture are analyzed using performance tools that access hardware counters. The bottleneck is the frequent and nearly simultaneous data-loads of all threads involved in the calculation of data allocated in the node where the master thread runs. Code changes that ensure these data-loads are localized improve performance by a factor close to two. The improvements carry over to other molecular models and other types of calculations. An expansion or an alternative of FirstPrivate OpenMP's clause can facilitate the code transformations.