affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system

  • Authors:
  • Henrik Löf;Sverker Holmgren

  • Affiliations:
  • Uppsala University, Uppsala, Sweden;Uppsala University, Uppsala, Sweden

  • Venue:
  • Proceedings of the 19th annual international conference on Supercomputing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The non-uniform memory access times of modern cc-NUMA systems often impair performance for shared memory applications. This is especially true for applications exhibiting complex access patterns. To improve performance, a mechanism for co-locating threads and data during the execution is needed. In this paper, we study how an affinity-on-next-touch procedure can be used to attain this goal. Such a procedure can increase thread-data affinity by migrating data across nodes to better match the access pattern. The migration is triggered by a directive and it can often be implemented as a re-invocation of a standard first-touch page placement procedure. We study an industrial-class scientific application where the thread-data affinity is small due to serial initializations of data structures accessed indirectly. Adding a single affinity-on-next-touch directive, we observed a performance improvement of 69% for 22 threads. We also perform experiments to study the scalability of the affinity-on-next-touch procedure. Our results indicate that the overhead associated with the procedure is highly dependent on the efficiency of the mechanism used to keep TLBs consistent. Using larger but fewer memory pages in the virtual memory sub-system we measured a total performance improvement of 166% compared to the original code.