Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The BlueGene/L supercomputer and quantum ChromoDynamics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Blue matter: approaching the limits of concurrency for classical molecular dynamics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Blue Gene/L programming and operating environment
IBM Journal of Research and Development
Holistic aggregate resource environment
ACM SIGOPS Operating Systems Review
Proceedings of the 22nd annual international conference on Supercomputing
Architecture of the Component Collective Messaging Interface
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations
Proceedings of the 23rd international conference on Supercomputing
A Case Study of Communication Optimizations on 3D Mesh Interconnects
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A design methodology for domain-optimized power-efficient supercomputing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable communication protocols for dynamic sparse data exchange
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
The LOFAR correlator: implementation and performance analysis
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Architecture of the Component Collective Messaging Interface
International Journal of High Performance Computing Applications
Optimal bucket algorithms for large MPI collectives on torus interconnects
Proceedings of the 24th ACM International Conference on Supercomputing
Middleware support for many-task computing
Cluster Computing
IBM research division cloud computing initiative
IBM Journal of Research and Development
A practical way to extend shared memory support beyond a motherboard at low cost
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing matrix transpose on torus interconnects
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
International Journal of High Performance Computing Applications
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
International Journal of High Performance Computing Applications
Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux
International Journal of High Performance Computing Applications
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
A light-weight virtual machine monitor for Blue Gene/P
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Extending and benchmarking the "Big Memory" implementation on Blue Gene/P Linux
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems
Computer Science - Research and Development
A system level view of Petascale I/O on IBM Blue Gene/P
Computer Science - Research and Development
Hybrid PGAS runtime support for multicore nodes
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
An optimal hidden-surface algorithm and its parallelization
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
The LOFAR beam former: implementation and performance analysis
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
The IBM Blue Gene/Q interconnection network and message unit
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Using the TOP500 to trace and project technology and architecture trends
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A cost-effective heuristic to schedule local and remote memory in cluster computers
The Journal of Supercomputing
Faster topology-aware collective algorithms through non-minimal communication
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
SGL: towards a bridging model for heterogeneous hierarchical platforms
International Journal of High Performance Computing and Networking
A lightweight virtual machine monitor for Blue Gene/P
International Journal of High Performance Computing Applications
Scalable automatic performance analysis on IBM bluegene/p systems
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Swift: A language for distributed parallel scripting
Parallel Computing
Collective algorithms for sub-communicators
Proceedings of the 26th ACM international conference on Supercomputing
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
I/O threads to reduce checkpoint blocking for an electromagnetics solver on Blue Gene/P and Cray XK6
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Metadynamics study of mutant human interferon-gamma forms
Computers & Mathematics with Applications
A new degree of freedom for memory allocation in clusters
Cluster Computing
Using blue gene/p and GPUs to accelerate computations in the EULAG model
LSSC'11 Proceedings of the 8th international conference on Large-Scale Scientific Computing
Modeling a leadership-scale storage system
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Journal of Parallel and Distributed Computing
Energy-efficient deadline scheduling for heterogeneous systems
Journal of Parallel and Distributed Computing
A divide and conquer strategy for scaling weather simulations with multiple regions of interest
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Looking under the hood of the IBM blue gene/Q network
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Measuring power consumption on IBM Blue Gene/P
Computer Science - Research and Development
Multi-domain job coscheduling for leadership computing systems
The Journal of Supercomputing
Designing energy efficient communication runtime systems: a view from PGAS models
The Journal of Supercomputing
Computer Science - Research and Development
IBM Journal of Research and Development
Design of the IBM Blue Gene/Q compute chip
IBM Journal of Research and Development
Packaging the IBM Blue Gene/Q supercomputer
IBM Journal of Research and Development
IBM Blue Gene/Q system software stack
IBM Journal of Research and Development
IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory
IBM Journal of Research and Development
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
Improving virtualization in the presence of software managed translation lookaside buffers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Optimization of MPI_Allreduce on the blue Gene/Q supercomputer
Proceedings of the 20th European MPI Users' Group Meeting
SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Proceedings of the High Performance Computing Symposium
Using simulation to explore distributed key-value stores for extreme-scale system services
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Low-power, low-storage-overhead chipkill correct via multi-line error correction
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Deadlock-free routing mechanism for 3D twin torus networks
Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
A divide and conquer strategy for scaling weather simulations with multiple regions of interest
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
On June 26, 2007, IBM announced the Blue Gene/P™ system as the leading offering in its massively parallel Blue Gene® supercomputer line, succeeding the Blue Gene/L™ system. The Blue Gene/P system is designed to scale to at least 262, 144 quad-processor nodes, with a peak performance of 3.56 petaflops. More significantly, the Blue Gene/P system enables this unprecedented scaling via architectural and design choices that maximize performance per watt, performance per square foot, and mean time between failures. This paper describes our vision of this petascale system, that is, a system capable of delivering more than a quadrillion (1015) floating-point operations per second. We also provide an overview of the system architecture, packaging, system software, and initial benchmark results.