Parallel application performance on shared high performance reconfigurable computing resources
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A versatile, low latency HyperTransport core
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
RAT: a methodology for predicting performance in application design migration to FPGAs
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
RAT: RC Amenability Test for Rapid Performance Prediction
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Performance Analysis with High-Level Languages for High-Performance Reconfigurable Computing
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Performance Analysis Framework for High-Level Language Applications in Reconfigurable Computing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Communication Performance Characterization for Reconfigurable Accelerator Design on the XD1000
RECONFIG '09 Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
FPGA implementation of kNN classifier based on wavelet transform and partial distance search
SCIA'07 Proceedings of the 15th Scandinavian conference on Image analysis
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
RCML: An Environment for Estimation Modeling of Reconfigurable Computing Systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Hi-index | 0.00 |
The design, implementation and optimization of FPGA accelerators is a challenging task, especially when the accelerator comprises multiple compute cores distributed across CPU and FPGA resources and memories and exhibits data-dependent runtime behavior. In order to simplify the development of FPGA accelerators we propose IMORC, an infrastructure and architecture template that helps raising the level of abstraction. The IMORC development flow bases on a modeling technique for visualizing an application's communication demand and an architecture template that aids the developer in implementing the design. The architectural template consists of a versatile on-chip interconnect with asynchronous FIFOs and bitwidth conversion placed into the communication links, a performance monitoring infrastructure for collecting performance information during runtime and a set of generic infrastructure cores which are frequently needed in accelerator designs. We demonstrate the usefulness of the IMORC development flow by means of the case study of accelerating the kth nearest neighbor thinning problem, where IMORC greatly helps us in understanding the communication demand and in implementing the application. With the integrated performance monitoring infrastructure, we gain insights into the data-dependent behavior of the accelerator that helps us in identifying bottlenecks and optimizing the accelerator to achieve a speedup of 10x to 40x over an optimized CPU implementation.