Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

  • Authors:
  • Fabio Checconi;Fabrizio Petrini;Jeremiah Willcock;Andrew Lumsdaine;Anamitra Roy Choudhury;Yogish Sabharwal

  • Affiliations:
  • IBM TJ Watson, Yorktown Heights, NY;IBM TJ Watson, Yorktown Heights, NY;CREST, Indiana University, Bloomington, IN;CREST, Indiana University, Bloomington, IN;IBM India Research, New Delhi, DL, India;IBM India Research, New Delhi, DL, India

  • Venue:
  • SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe the challenges involved in designing a family of highly-efficient Breadth-First Search (BFS) algorithms and in optimizing these algorithms on the latest two generations of Blue Gene machines, Blue Gene/P and Blue Gene/Q. With our recent winning Graph 500 submissions in November 2010, June 2011, and November 2011, we have achieved unprecedented scalability results in both space and size. On Blue Gene/P, we have been able to parallelize a scale 38 problem with 238 vertices and 242 edges on 131,072 processing cores. Using only four racks of an experimental configuration of Blue Gene/Q, we have achieved a processing rate of 254 billion edges per second on 65,536 processing cores. This paper describes the algorithmic design and the main classes of optimizations that we have used to achieve these results.