Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Adaptive Bubble Router: A Design to Improve Performance in Torus Networks
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Blue Matter, an application framework for molecular simulation on blue gene
Journal of Parallel and Distributed Computing - High-performance computational biology
Critical event prediction for proactive management in large-scale computer clusters
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A class of OpenMP applications involving nested parallelism
Proceedings of the 2004 ACM symposium on Applied computing
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
Adaptive incremental checkpointing for massively parallel systems
Proceedings of the 18th annual international conference on Supercomputing
Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism
Proceedings of the 31st annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
The Supercomputer Industry in Light of the Top500 Data
Computing in Science and Engineering
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Scalable Line Dynamics in ParaDiS
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Performance and Scalability Analysis of the BlueGene/L Architecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Unlocking the Performance of the BlueGene/L Supercomputer
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Part II: A Methodology for Developing Deadlock-Free Dynamic Network Reconfiguration Processes
IEEE Transactions on Parallel and Distributed Systems
Exploring the Energy-Time Tradeoff in MPI Programs on a Power-Scalable Cluster
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
MegaProto: A Low-Power and Compact Cluster for High-Performance Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Improvement of Power-Performance Efficiency for High-End Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Exploring the Energy-Time Tradeoff in High-Performance Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Mambo: a full system simulator for the PowerPC architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Evaluating kilo-instruction multiprocessors
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Logic-based eDRAM: origins and rationale for use
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Using multiple energy gears in MPI programs on a power-scalable cluster
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fault tolerant high performance computing by a coding approach
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scaling physics and material science applications on a massively parallel Blue Gene/L system
Proceedings of the 19th annual international conference on Supercomputing
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Massively parallel implementation of a fast multipole method for distributed memory machines
Journal of Parallel and Distributed Computing
A Recursion-Based Broadcast Paradigm in Wormhole Routed Networks
IEEE Transactions on Parallel and Distributed Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Tera-Scalable Algorithms for Variable-Density Elliptic Hydrodynamics with Spectral Accuracy
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance characterization of molecular dynamics techniques for biomolecular simulations
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable dynamic binary instrumentation for Blue Gene/L
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
A Routing Methodology for Achieving Fault Tolerance in Direct Networks
IEEE Transactions on Computers
Towards a framework for dedicated operating systems development in high-end computing systems
ACM SIGOPS Operating Systems Review
High-performance adaptive routing for networks with arbitrary topology
Journal of Systems Architecture: the EUROMICRO Journal
Cooperative checkpointing: a robust approach to large-scale systems reliability
Proceedings of the 20th annual international conference on Supercomputing
Scaling MPI to short-memory MPPs such as BG/L
Proceedings of the 20th annual international conference on Supercomputing
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A near-optimal real-time hardware scheduler for large cardinality crossbar switches
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Topology mapping for Blue Gene/L supercomputer
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Zonal methods for the parallel execution of range-limited N-body simulations
Journal of Computational Physics
Coprocessor design to support MPI primitives in configurable multiprocessors
Integration, the VLSI Journal
Memory-miser: a performance-constrained runtime system for power-scalable clusters
Proceedings of the 4th international conference on Computing frontiers
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications
IEEE Transactions on Parallel and Distributed Systems
Accelerating time to market by reducing system test time
SE'07 Proceedings of the 25th conference on IASTED International Multi-Conference: Software Engineering
Multitoroidal Interconnects For Tightly Coupled Supercomputers
IEEE Transactions on Parallel and Distributed Systems
BlueGene/L applications: Parallelism On a Massive Scale
International Journal of High Performance Computing Applications
Parallelization of IBM mambo system simulator in functional modes
ACM SIGOPS Operating Systems Review
Holistic aggregate resource environment
ACM SIGOPS Operating Systems Review
Dynamic binary instrumentation and data aggregation on large scale systems
International Journal of Parallel Programming
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A study of the effects of machine geometry and mapping on distributed transpose performance
Proceedings of the 5th conference on Computing frontiers
Preserving time in large-scale communication traces
Proceedings of the 22nd annual international conference on Supercomputing
Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l
Proceedings of the 22nd annual international conference on Supercomputing
Identifying, tabulating, and analyzing contacts between branched neuron morphologies
IBM Journal of Research and Development
Architecture of Qbox: a scalable first-principles molecular dynamics code
IBM Journal of Research and Development
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications
The Journal of Supercomputing
Reducing the Interconnection Network Cost of Chip Multiprocessors
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Scientific Programming - Large-Scale Programming Tools and Environments
Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
Journal of Parallel and Distributed Computing
Compiler-Enhanced Incremental Checkpointing
Languages and Compilers for Parallel Computing
A two-stage hardware scheduler combining greedy and optimal scheduling
Journal of Parallel and Distributed Computing
Advancing supercomputer performance through interconnection topology synthesis
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
An analysis of clustered failures on large supercomputing systems
Journal of Parallel and Distributed Computing
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scalable Time Warp on Blue Gene Supercomputers
PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
ICS'08 Proceedings of the 12th WSEAS international conference on Systems
Dependability Analysis of a Fault-Tolerant Network Reconfiguring Strategy
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
HPPNetSim: a parallel simulation of large-scale interconnection networks
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Power-aware provisioning of Cloud resources for real-time services
Proceedings of the 7th International Workshop on Middleware for Grids, Clouds and e-Science
Packaging the Blue Gene/L supercomputer
IBM Journal of Research and Development
Blue Gene/L torus interconnection network
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
IBM Journal of Research and Development
Resource allocation and utilization in the Blue Gene/L supercomputer
IBM Journal of Research and Development
IBM Journal of Research and Development
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A PAPI implementation for BlueGene
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A clustering model for multicast on hypercube network
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Runtime Energy Adaptation with Low-Impact Instrumented Code in a Power-Scalable Cluster System
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Predicting computer system failures using support vector machines
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
A flexible checkpoint/restart model in distributed systems
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Profile-driven selective program loading
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
jitSim: a simulator for predicting scalability of parallel applications in presence of OS jitter
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Toward reliable and efficient message passing software through formal analysis
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On improving performance and energy profiles of sparse scientific applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Conjugate gradient sparse solvers: performance-power characteristics
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Online strategies for high-performance power-aware thread execution on emerging multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Megaproto/E: power-aware high-performance cluster with commodity technology
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Lossless compression for large scale cluster logs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Evaluating cooperative checkpointing for supercomputing systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A study of MPI performance analysis tools on blue gene/L
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cooperative checkpointing theory
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
ACM Transactions on Architecture and Code Optimization (TACO)
Multicore OSes: looking forward from 1991, er, 2011
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Scalable RF propagation modeling on the IBM Blue Gene/L and Cray XT5 supercomputers
Winter Simulation Conference
Automatic generation of executable communication specifications from parallel applications
Proceedings of the international conference on Supercomputing
An effective speedup metric for measuring productivity in large-scale parallel computer systems
The Journal of Supercomputing
Advances and challenges in log analysis
Communications of the ACM
Algorithmic ramifications of prefetching in memory hierarchy
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Advances and Challenges in Log Analysis
Queue - Log Analysis
Scalable parallel trace-based performance analysis
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Efficient implementation of allreduce on bluegene/l collective network
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Proactive process-level live migration and back migration in HPC environments
Journal of Parallel and Distributed Computing
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Early experiences with KTAU on the IBM BG/L
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Early experience with scientific applications on the blue gene/l supercomputer
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Performance measurements of the 3D FFT on the blue gene/l supercomputer
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Performance effects of node mappings on the IBM bluegene/l machine
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
INSEE: an interconnection network simulation and evaluation environment
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Open job management architecture for the blue gene/l supercomputer
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Fast and efficient submesh determination in faulty tori
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Super-Scalable algorithms for computing on 100,000 processors
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
DDM-CMP: data-driven multithreading on a chip multiprocessor
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Power use of disk subsystems in supercomputers
Proceedings of the sixth workshop on Parallel Data Storage
Safe overprovisioning: using power limits to increase aggregate throughput
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Analyzing disturbed diffusion on networks
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Compiler-Directed energy-time tradeoff in MPI programs on DVS-Enabled parallel systems
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Evaluating operating system vulnerability to memory errors
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Wimpy nodes with 10GbE: leveraging one-sided operations in soft-RDMA to boost memcached
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Library support for parallel sorting in scientific computations
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Failure prediction based on log files using Random Indexing and Support Vector Machines
Journal of Systems and Software
A windows-based parallel file system
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
The impact of global communication latency at extreme scales on Krylov methods
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
On deciding between conservative and optimistic approaches on massively parallel platforms
Proceedings of the Winter Simulation Conference
An overview of energy efficiency techniques in cluster computing systems
Cluster Computing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.02 |
This paper gives an overview of the BlueGene/L Supercomputer. This is a jointly funded research partnership between IBM and the Lawrence Livermore National Laboratory as part of the United States Department of Energy ASCI Advanced Architecture Research Program. Application performance and scaling studies have recently been initiated with partners at a number of academic and government institutions, including the San Diego Supercomputer Center and the California Institute of Technology. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004--2005 time frame, at price/performance and power consumption/performance targets unobtainable with conventional architectures.