GraphStep: A System Architecture for Sparse-Graph Algorithms

Authors:
Michael deLorimier;Nachiket Kapre;Nikil Mehta;Dominic Rizzo;Ian Eslick;Raphael Rubin;Tomas E. Uribe;Thomas F. Jr. Knight;Andre DeHon
Affiliations:
California Institute of Technology;California Institute of Technology;California Institute of Technology;California Institute of Technology;California Institute of Technology;California Institute of Technology;California Institute of Technology;California Institute of Technology;California Institute of Technology
Venue:
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Year:
2006

Citing 0
Cited 6

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Scalable Graph Exploration on Multicore Processors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Spatial hardware implementation for sparse graph algorithms in GraphStep

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Rapid Synthesis and Simulation of Computational Circuits in an MPPA

Journal of Signal Processing Systems
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Massive data analytics: the graph 500 on IBM Blue Gene/Q

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this "memory wall," we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreadingactivation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor.