Massive data analytics: the graph 500 on IBM Blue Gene/Q

Authors:
F. Checconi;F. Petrini
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
IBM Journal of Research and Development
Year:
2013

Citing 22
Cited 0

HAGAR: Efficient Multi-context Graph Processors

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
On the Architectural Requirements for Efficient Execution of Graph Algorithms

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Journal of Parallel and Distributed Computing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
GraphStep: A System Architecture for Sparse-Graph Algorithms

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Early experiences with large-scale Cray XMT systems

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Overview of the Blue Gene/L system architecture

IBM Journal of Research and Development
High-performance graph algorithms from parallel sparse matrices

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Accelerating large graph algorithms on the GPU using CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
Fast PGAS Implementation of Distributed Graph Algorithms

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Graph Exploration on Multicore Processors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
The IBM Blue Gene/Q interconnection network and message unit

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel breadth-first search on distributed memory systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Green-Marl: a DSL for easy and efficient graph analysis

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
The IBM Blue Gene/Q Compute Chip

IEEE Micro
An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
The IBM Blue Gene/Q Interconnection Fabric

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph algorithms are becoming increasingly important for biology, transportation, business intelligence, and a wide range of commercial workloads. Most graph algorithms stress to the limit various architectural aspects of conventional machines. The memory access patterns are irregular, with little spatial locality and data reuse. The amount of computation per loaded byte is very small, typically involving bit manipulation; pointer-chasing is often the norm. Likewise, the generated network traffic comprises small packets that are sent to random destinations at a very high messaging rate. With our recent winning Graph 500 submissions in November 2010, June 2011, and November 2011, we have demonstrated the versatility of the IBM Blue Gene® family of supercomputers and the possibility of using them to parallelize demanding data-intensive applications. In this paper, we describe the algorithmic techniques that we used to map the Graph 500 breadth-first search (BFS) exploration on the IBM Blue Gene®/Q, achieving a performance of 254 billion traversed edges per second.