MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
International Journal of High Performance Computing Applications
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers
IBM Journal of Research and Development
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Petascale computing with accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
Implementing a hierarchical Bayesian visual cortex model on multi-core processors
Proceedings of the 47th Annual Southeast Regional Conference
Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A Multilevel Parallelization Framework for High-Order Stencil Computations
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Visualization-Driven Structural and Statistical Analysis of Turbulent Flows
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Multi-core acceleration of chemical kinetics for simulation and prediction
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Towards a framework for abstracting accelerators in parallel applications: experience with cell
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Using piecewise polynomials for faster potential function evaluation
Journal of Computational Physics
A case study on dynamic kernel adaptation in a component-based infectious disease simulator
Proceedings of the 2009 Workshop on Component-Based High Performance Computing
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cortical architectures on a GPGPU
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
State-of-the-art in heterogeneous computing
Scientific Programming
Remote Process Execution and Remote File I/O for Heterogeneous Processors in Cluster Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
IBM BladeCenter QS22: design, performance, and utilization in hybrid computing systems
IBM Journal of Research and Development
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
Programming the Linpack benchmark for Roadrunner
IBM Journal of Research and Development
Recursion-driven parallel code generation for multi-core platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Vision for cross-layer optimization to address the dual challenges of energy and reliability
Proceedings of the Conference on Design, Automation and Test in Europe
Improving scratchpad allocation with demand-driven data tiling
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A capabilities-aware framework for using computational accelerators in data-intensive computing
Journal of Parallel and Distributed Computing
TH-1: China's first petaflop supercomputer
Frontiers of Computer Science in China
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Journal of Computational Physics
HPC environment management: new challenges in the petaflop era
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Programming heterogeneous clusters with accelerators using object-based programming
Scientific Programming
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method
Journal of Computational Physics
Reusable software components for accelerator-based clusters
Journal of Systems and Software
Performance modeling for multilevel communication in SHMEM+
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Liszt: a domain specific language for building portable mesh-based PDE solvers
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Using the TOP500 to trace and project technology and architecture trends
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FTI: high performance fault tolerance interface for hybrid systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
An early performance analysis of POWER7-IH HPC systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the simulation of large-scale architectures using multiple application abstraction levels
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
Hybrid MPI-cell parallelism for hyperbolic PDE simulation on a cell processor cluster
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
The low-power architecture approach towards exascale computing
Proceedings of the second workshop on Scalable algorithms for large-scale systems
Optimizing sweep3d for graphic processor unit
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
SGL: towards a bridging model for heterogeneous hierarchical platforms
International Journal of High Performance Computing and Networking
Analysis of gravitational wave signals on heterogeneous architectures
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
An efficient scheduler of RTOS for multi/many-core system
Computers and Electrical Engineering
Reducing the impact of soft errors on fabric-based collective communications
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
The Journal of Supercomputing
Scheduling streaming applications on a complex multicore platform
Concurrency and Computation: Practice & Experience
A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems
Future Generation Computer Systems
The Experience in Designing and Evaluating the High Performance Cluster Netuno
International Journal of Parallel Programming
Hi-index | 0.01 |
Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.