Starting with termination: a methodology for building distributed garbage collection algorithms
ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Concurrent clustered programming
CONCUR 2005 - Concurrency Theory
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Scalable Dynamic Load Balancing Using UPC
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Featherweight X10: a core calculus for async-finish parallelism
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
UTS: an unbalanced tree search benchmark
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
The PERCS High-Performance Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A work-stealing scheduler for X10's task parallelism with suspension
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A performance model for X10 applications: what's going on under the hood?
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Using the Cowichan problems to investigate the programmability of X10 programming system
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Least squares quantization in PCM
IEEE Transactions on Information Theory
IBM Power Systems 775 for Aix and Linux Hpc Solution
IBM Power Systems 775 for Aix and Linux Hpc Solution
The power 775 architecture at scale
Proceedings of the 27th international ACM conference on International conference on supercomputing
Managing Asynchronous Operations in Coarray Fortran 2.0
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Resilient X10: efficient failure-aware programming
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 0.00 |
X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes. We demonstrate that X10 delivers solid performance at petascale by running (weak scaling) eight application kernels on an IBM Power 775 supercomputer utilizing up to 55,680 Power7 cores (for 1.7 Pflop/s of theoretical peak performance). We detail our advances in distributed termination detection, distributed load balancing, and use of high-performance interconnects that enable X10 to scale out to tens of thousands of cores. For the four HPC Class 2 Challenge benchmarks, X10 achieves 41% to 87% of the system's potential at scale (as measured by IBM's HPCC Class 1 optimized runs). We also implement K-Means, Smith-Waterman, Betweenness Centrality, and Unbalanced Tree Search (UTS) for geometric trees. Our UTS implementation is the first to scale to petaflop systems.