Efficient parallel algorithms for computing all pair shortest paths in directed graphs
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
ScaLAPACK user's guide
A randomized parallel algorithm for single-source shortest paths
Journal of Algorithms
A Parallelization of Dijkstra's Shortest Path Algorithm
MFCS '98 Proceedings of the 23rd International Symposium on Mathematical Foundations of Computer Science
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Scalable Line Dynamics in ParaDiS
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Performance and Scalability Analysis of the BlueGene/L Architecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Evaluating high-performance computers: Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Tera-Scalable Algorithms for Variable-Density Elliptic Hydrodynamics with Spectral Accuracy
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scalable dynamic binary instrumentation for Blue Gene/L
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
Quantifying the effectiveness of load balance algorithms
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale, with 131,072 processors, and absolute performance, with a peak rate of 367 Tflop/s. BG/L has led the last four Top500 lists with a Linpack rate of 280.6 Tflop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine such as BG/L derives from the scientific breakthroughs that real applications can produce by successfully using its unprecedented scale and computational power. In this paper, we describe our experiences with eight large scale applications on BG/ L from several application domains, ranging from molecular dynamics to dislocation dynamics and turbulence simulations to searches in semantic graphs. We also discuss the challenges we faced when scaling these codes and present several successful optimization techniques. All applications show excellent scaling behavior, even at very large processor counts, with one code even achieving a sustained performance of more than 100 Tflop/s, clearly demonstrating the real success of the BG/L design.