Partial globalization of partitioned address spaces for zero-copy communication with shared memory

  • Authors:
  • Fangzhou Jiao;Nilesh Mahajan;Jeremiah Willcock;Arun Chauhan;Andrew Lumsdaine

  • Affiliations:
  • School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405;School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405;School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405;School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405;School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405

  • Venue:
  • HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have developed a high-level language, called Kanor, for declaratively specifying communication in parallel programs. Designed as an extension of C++, it serves to coordinate partitioned address space programs written in the bulk synchronous parallel (BSP) style. Kanor's declarative semantics enable the programmers to write correct and maintainable parallel applications. The communication abstraction has been carefully designed to be amenable to compiler optimizations. While partitioned address space programming has several advantages, it needs special compiler optimizations to effectively leverage the shared memory hardware when running on multicore machines. In this paper, we introduce such shared-memory optimizations in the context of Kanor. One major way we achieve these optimizations is by selectively moving some of the variables into a globally shared address space--a process that we term partial globalization. We identify scenarios in which such a transformation is beneficial, and present an algorithm to identify and correctly transform Kanor communication steps into zero-copy communication using hardware shared memory, by introducing minimal synchronization. We then present a runtime strategy that complements the compiler algorithm to eliminate most of the runtime synchronization overheads by using a copy-on-conflict technique. Finally, we show that our solution often performs much better than shared-memory optimized MPI, and ne ver performs significantly worse than MPI even in the presence of dependencies introduced due to buffer sharing. The techniques in this paper demonstrate that it is possible to program in a partitioned address space style, without sacrificing the performance advantages of hardware shared memory. To the best of our knowledge no other automatic compiler techniques have been developed so far that achieve zero-copy communication from a partitioned address space program. We expect out results to be applicable beyond Kanor, to other partitioned address space programming environments, such as MPI.