Optimizing the Barnes-Hut algorithm in UPC

  • Authors:
  • Junchao Zhang;Babak Behzad;Marc Snir

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

PGAS languages' support of a global name space facilitates the expression of parallel algorithms, since communication is implicit. This is especially convenient when writing irregular applications with data-dependent, dynamically changing communication patterns. However, programming in a shared memory style, with no explicit control of communication, may result in poor performance. The problem may be due to weaknesses of current implementations of PGAS languages or limitations inherent in these languages. To clarify which is the case, we discuss an implementation in UPC of the Barnes-Hut algorithm. A literal port of a good quality shared-memory implementation (merely replacing shared arrays with partitioned global arrays) achieves abysmal performance -- more than 1000 times worse than a message-passing implementation. We achieve in UPC a performance comparable to message-passing with a series of optimizations. Most of these optimizations could be performed with limited changes in the source code using an enhanced run-time and a few language extensions or pragmas. We discuss the implications to the programmer, the compiler and PGAS languages themselves.