Spatial hardware implementation for sparse graph algorithms in GraphStep

Authors:
Michael Delorimier;Nachiket Kapre;Nikil Mehta;André Dehon
Affiliations:
University of Pennsylvania, Philadelphia;University of Pennsylvania, Philadelphia;University of Pennsylvania, Philadelphia;University of Pennsylvania, Philadelphia
Venue:
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Year:
2011

Citing 29
Cited 0

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
The connection machine

The connection machine
Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Actors: a model of concurrent computation in distributed systems

Actors: a model of concurrent computation in distributed systems
A bridging model for parallel computation

Communications of the ACM
Introduction to algorithms

Introduction to algorithms
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook

The high performance Fortran handbook
GRASP: A Search Algorithm for Propositional Satisfiability

IEEE Transactions on Computers
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Compact, multilayer layout for butterfly fat-tree

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Distributed computation on graphs: shortest path algorithms

Communications of the ACM
A machine program for theorem-proving

Communications of the ACM
Improved algorithms for hypergraph bipartitioning

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Types and programming languages

Types and programming languages
A unified approach to global program optimization

POPL '73 Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Hardware-assisted simulated annealing with application for fast FPGA placement

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Classification and Retrieval of Knowledge on a Parallel Marker-Passing Architecture

IEEE Transactions on Knowledge and Data Engineering
Stream Computations Organized for Reconfigurable Execution (SCORE)

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Markov Random Fields with Efficient Approximations

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Repartitioning Unstructured Adaptive Meshes

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ConceptNet — A Practical Commonsense Reasoning Tool-Kit

BT Technology Journal
NP-Click: A Productive Software Development Approach for Network Processors

IEEE Micro
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
GraphStep: A System Architecture for Sparse-Graph Algorithms

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Packet Switched vs. Time Multiplexed FPGA Overlay Networks

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
On a Pin Versus Block Relationship For Partitions of Logic Graphs

IEEE Transactions on Computers
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

How do we develop programs that are easy to express, easy to reason about, and able to achieve high performance on massively parallel machines? To address this problem, we introduce GraphStep, a domain-specific compute model that captures algorithms that act on static, irregular, sparse graphs. In GraphStep, algorithms are expressed directly without requiring the programmer to explicitly manage parallel synchronization, operation ordering, placement, or scheduling details. Problems in the sparse graph domain are usually highly concurrent and communicate along graph edges. Exposing concurrency and communication structure allows scheduling of parallel operations and management of communication that is necessary for performance on a spatial computer. We study the performance of a semantic network application, a shortest-path application, and a max-flow/min-cut application. We introduce a language syntax for GraphStep applications. The total speedup over sequential versions of the applications studied ranges from a factor of 19 to a factor of 15,000. Spatially-aware graph optimizations (e.g., node decomposition, placement and route scheduling) delivered speedups from 3 to 30 times over a spatially-oblivious mapping.