A fast algorithm for particle simulations
Journal of Computational Physics
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
Hi-index | 0.00 |
Performance anomalies when running Gaussian frequency calculations in parallel on SGI Altix computers with CC-NUMA memory architecture are analyzed using performance tools that access hardware counters. The bottleneck is the frequent and nearly simultaneous data-loads of all threads involved in the calculation of data allocated in the node where the master thread runs. Code changes that ensure these data-loads are localized improve performance by a factor close to two. The improvements carry over to other molecular models and other types of calculations. An expansion or an alternative of FirstPrivate OpenMP's clause can facilitate the code transformations.