HAGAR: Efficient Multi-context Graph Processors
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
On the Architectural Requirements for Efficient Execution of Graph Algorithms
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
GraphStep: A System Architecture for Sparse-Graph Algorithms
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Early experiences with large-scale Cray XMT systems
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
High-performance graph algorithms from parallel sparse matrices
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Accelerating large graph algorithms on the GPU using CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
Fast PGAS Implementation of Distributed Graph Algorithms
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Graph Exploration on Multicore Processors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
The IBM Blue Gene/Q interconnection network and message unit
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel breadth-first search on distributed memory systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Green-Marl: a DSL for easy and efficient graph analysis
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
The IBM Blue Gene/Q Compute Chip
IEEE Micro
An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
The IBM Blue Gene/Q Interconnection Fabric
IEEE Micro
Hi-index | 0.00 |
Graph algorithms are becoming increasingly important for biology, transportation, business intelligence, and a wide range of commercial workloads. Most graph algorithms stress to the limit various architectural aspects of conventional machines. The memory access patterns are irregular, with little spatial locality and data reuse. The amount of computation per loaded byte is very small, typically involving bit manipulation; pointer-chasing is often the norm. Likewise, the generated network traffic comprises small packets that are sent to random destinations at a very high messaging rate. With our recent winning Graph 500 submissions in November 2010, June 2011, and November 2011, we have demonstrated the versatility of the IBM Blue Gene® family of supercomputers and the possibility of using them to parallelize demanding data-intensive applications. In this paper, we describe the algorithmic techniques that we used to map the Graph 500 breadth-first search (BFS) exploration on the IBM Blue Gene®/Q, achieving a performance of 254 billion traversed edges per second.