IMORC: An infrastructure and architecture template for implementing high-performance reconfigurable FPGA accelerators

Authors:
Tobias Schumacher;Christian Plessl;Marco Platzner
Affiliations:
Paderborn Center for Parallel Computing, University of Paderborn, 33098 Paderborn, Germany;Paderborn Center for Parallel Computing, University of Paderborn, 33098 Paderborn, Germany;Paderborn Center for Parallel Computing, University of Paderborn, 33098 Paderborn, Germany
Venue:
Microprocessors & Microsystems
Year:
2012

Citing 14
Cited 0

Parallel application performance on shared high performance reconfigurable computing resources

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Simplifying the Integration of Processing Elements in Computing Systems Using a Programmable Controller

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A versatile, low latency HyperTransport core

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
RAT: a methodology for predicting performance in application design migration to FPGAs

HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
Performance analysis challenges and framework for high-performance reconfigurable computing

Parallel Computing
RAT: RC Amenability Test for Rapid Performance Prediction

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Performance Analysis with High-Level Languages for High-Performance Reconfigurable Computing

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Performance Analysis Framework for High-Level Language Applications in Reconfigurable Computing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
IMORC: Application Mapping, Monitoring and Optimization for High-Performance Reconfigurable Computing

FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Communication Performance Characterization for Reconfigurable Accelerator Design on the XD1000

RECONFIG '09 Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs
Combining flash memory and fpgas to efficiently implement a massively parallel algorithm for content-based image retrieval

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
FPGA implementation of kNN classifier based on wavelet transform and partial distance search

SCIA'07 Proceedings of the 15th Scandinavian conference on Image analysis
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
RCML: An Environment for Estimation Modeling of Reconfigurable Computing Systems

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design, implementation and optimization of FPGA accelerators is a challenging task, especially when the accelerator comprises multiple compute cores distributed across CPU and FPGA resources and memories and exhibits data-dependent runtime behavior. In order to simplify the development of FPGA accelerators we propose IMORC, an infrastructure and architecture template that helps raising the level of abstraction. The IMORC development flow bases on a modeling technique for visualizing an application's communication demand and an architecture template that aids the developer in implementing the design. The architectural template consists of a versatile on-chip interconnect with asynchronous FIFOs and bitwidth conversion placed into the communication links, a performance monitoring infrastructure for collecting performance information during runtime and a set of generic infrastructure cores which are frequently needed in accelerator designs. We demonstrate the usefulness of the IMORC development flow by means of the case study of accelerating the kth nearest neighbor thinning problem, where IMORC greatly helps us in understanding the communication demand and in implementing the application. With the integrated performance monitoring infrastructure, we gain insights into the data-dependent behavior of the accelerator that helps us in identifying bottlenecks and optimizing the accelerator to achieve a speedup of 10x to 40x over an optimized CPU implementation.