Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
The Garp Architecture and C Compiler
Computer
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
High-throughput bayesian computing machine with reconfigurable hardware
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
ParaLearn: a massively parallel, scalable system for learning interaction networks on FPGAs
Proceedings of the 24th ACM International Conference on Supercomputing
Bridging the GPGPU-FPGA efficiency gap
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Synthesis of Platform Architectures from OpenCL Programs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Being Bayesian about network structure
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
We present a highly productive approach to hardware design based on a many-coremicroarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the systemfor different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.