BlueGene/L applications: Parallelism On a Massive Scale

Authors:
Bronis R. De Supinski;Martin Schulz;Vasily V. Bulatov;William Cabot;Bor Chan;Andrew W. Cook;Erik W. Draeger;James N. Glosli;Jeffrey A. Greenough;Keith Henderson;Alison Kubota;Steve Louis;Brian J. Miller;Mehul V. Patel;Thomas E. Spelce;Frederick H. Streitz;Peter L. Williams;Robert K. Yates;Andy Yoo;George Almasi;Gyan Bhanot;Alan Gara;John A. Gunnels;Manish Gupta;Jose Moreira;James Sexton;Bob Walkup;Charles Archer;Francois Gygi;Timothy C. Germann;Kai Kadau;Peter S. Lomdahl;Charles Rendleman;Michael L. Welcome;William Mclendon;Bruce Hendrickson;Franz Franchetti;Stefan Kral;Jürgen Lorenz;Christoph W. Überhuber;Edmond Chow;Ümit Çatalyürek
Affiliations:
LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA,;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;LAWRENCE LIVERMORE NATIONAL LABORATORY, LIVERMORE, CA 94551, USA;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM THOMAS J. WATSON RESEARCH CENTER;IBM SYSTEMS AND TECHNOLOGY GROUP;UNIVERSITY OF CALIFORNIA, DAVIS;LOS ALAMOS NATIONAL LABORATORY;LOS ALAMOS NATIONAL LABORATORY;LOS ALAMOS NATIONAL LABORATORY;LAWRENCE BERKELEY NATIONAL LABORATORY;LAWRENCE BERKELEY NATIONAL LABORATORY;SANDIA NATIONAL LABORATORIES;SANDIA NATIONAL LABORATORIES;CARNEGIE MELLON UNIVERSITY;VIENNA UNIVERSITY OF TECHNOLOGY;VIENNA UNIVERSITY OF TECHNOLOGY;VIENNA UNIVERSITY OF TECHNOLOGY;D. E. SHAW RESEARCH AND DEVELOPMENT;OHIO STATE UNIVERSITY
Venue:
International Journal of High Performance Computing Applications
Year:
2008

Citing 14
Cited 1

Efficient parallel algorithms for computing all pair shortest paths in directed graphs

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
ScaLAPACK user's guide

ScaLAPACK user's guide
A randomized parallel algorithm for single-source shortest paths

Journal of Algorithms
A Parallelization of Dijkstra's Shortest Path Algorithm

MFCS '98 Proceedings of the 23rd International Symposium on Mathematical Foundations of Computer Science
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Scalable Line Dynamics in ParaDiS

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Performance and Scalability Analysis of the BlueGene/L Architecture

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Evaluating high-performance computers: Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Tera-Scalable Algorithms for Variable-Density Elliptic Hydrodynamics with Spectral Accuracy

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scalable dynamic binary instrumentation for Blue Gene/L

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Vectorization techniques for the Blue Gene/L double FPU

IBM Journal of Research and Development

Quantifying the effectiveness of load balance algorithms

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale, with 131,072 processors, and absolute performance, with a peak rate of 367 Tflop/s. BG/L has led the last four Top500 lists with a Linpack rate of 280.6 Tflop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine such as BG/L derives from the scientific breakthroughs that real applications can produce by successfully using its unprecedented scale and computational power. In this paper, we describe our experiences with eight large scale applications on BG/ L from several application domains, ranging from molecular dynamics to dislocation dynamics and turbulence simulations to searches in semantic graphs. We also discuss the challenges we faced when scaling these codes and present several successful optimization techniques. All applications show excellent scaling behavior, even at very large processor counts, with one code even achieving a sustained performance of more than 100 Tflop/s, clearly demonstrating the real success of the BG/L design.