X10 and APGAS at Petascale

  • Authors:
  • Olivier Tardieu;Benjamin Herta;David Cunningham;David Grove;Prabhanjan Kambadur;Vijay Saraswat;Avraham Shinnar;Mikio Takeuchi;Mandana Vaziri

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;Google Inc., New York, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM Research, Tokyo, Japan;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

  • Venue:
  • Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes. We demonstrate that X10 delivers solid performance at petascale by running (weak scaling) eight application kernels on an IBM Power 775 supercomputer utilizing up to 55,680 Power7 cores (for 1.7 Pflop/s of theoretical peak performance). We detail our advances in distributed termination detection, distributed load balancing, and use of high-performance interconnects that enable X10 to scale out to tens of thousands of cores. For the four HPC Class 2 Challenge benchmarks, X10 achieves 41% to 87% of the system's potential at scale (as measured by IBM's HPCC Class 1 optimized runs). We also implement K-Means, Smith-Waterman, Betweenness Centrality, and Unbalanced Tree Search (UTS) for geometric trees. Our UTS implementation is the first to scale to petaflop systems.