Introduction to Algorithms
Implementing the scale vector-thread processor
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
LegUp: high-level synthesis for FPGA-based processor/accelerator systems
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Synthesis of Platform Architectures from OpenCL Programs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Hi-index | 0.00 |
Programming models assist developers in creating high performance computing systems by forming a higher level abstraction of the target platform. OpenCL has emerged as a standard programming model for heterogeneous systems and there has been recent activity combining OpenCL and FPGAs. This work introduces memory infrastructure for FPGAs and is designed for OpenCL style computation, complementing previous work. An Aggregating Memory Controller is implemented in hardware and aims to maximize bandwidth to external, large, high-latency, high-bandwidth memories by finding the minimal number of external memory burst requests from a vector of requests. A template processing array with soft-processor and hand-coded hardware elements was also designed to drive the memory controller. The Aggregating Memory Controller is described in terms of operation and future scalability and the created processing array is described as a flexible structure that can support many types of processing solutions. A hardware prototype of the memory controller and processing array was implemented on a Virtex-5 LX110T FPGA. Two micro-benchmarks were run on both the soft-processor elements and the hand-coded hardware cores to exercise the memory controller. Results for effective memory bandwidth within the system show that the high-latency can be hidden using the Aggregating Memory Controller by increasing the number of threads within the processing array.