Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
Domain Specific Mapping for Solving Graph Problems on Reconfigurable Devices
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
PAM-Blox II: Design and Evaluation of C++ Module Generation for Computing with FPGAs
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Online allocation for contention-free-routing NoCs
Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
Hi-index | 0.00 |
Graph algorithms, such as vertex reachability, transitive closure, and shortest path, are fundamental in many computing applications. We address the question of how to utilize the bit-level parallelism available in hardware, and specifically in FPGAs, to implement such graph algorithms for speedup relative to their software counterparts.This paper generalizes the idea of a data-structure residing in reconfigurable hardware that, along with support logic and software in a microprocessor, accelerates a core algorithm. We give two examples of this idea. First, we draw parallels to content addressable memories. Second, we show how to extend the idea of mapping the adjacency matrix representation of a graph to a HArdware Graph ARray (HAGAR). We describe HAGAR implementations for graph reachability and shortest path. Reachability is a building block that can further be used to implement transitive closure, connected components, and other highlevel graph algorithms. To handle large graphs where such an approach can excel relative to software, we develop a methodology, using FPGA internal small RAM blocks, to store and switch between multiple contexts of a regular architecture. The proposed circuits are implemented within the PAM-Blox module generation environment using Compaq's PamDC, and run on an FPGA accelerator card.