An FPGA-based multi-core approach for pipelining computing stages

Authors:
Ali Azarian;João M. P. Cardoso;Stephan Werner;Jürgen Becker
Affiliations:
Universidade do Porto, Porto, Portugal;Universidade do Porto, Porto, Portugal;Karlsruhe Institute of Technology -- KIT, Karlsruhe, Germany;Karlsruhe Institute of Technology -- KIT, Karlsruhe, Germany
Venue:
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Year:
2013

Citing 8
Cited 0

A pipelined, shared resource MIMD computer

Advanced computer architecture
Concurrent programming: principles and practice

Concurrent programming: principles and practice
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
A Compile Time Based Approach for Solving Out-of-Order Communication in Kahn Process Networks

ASAP '02 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Coarse-Grain Pipelining on Multiple FPGA Architectures

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Solving Out-of-Order Communication in Kahn Process Networks

Journal of VLSI Signal Processing Systems
Zero cost indexing for improved processor cache performance

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Data-Driven Approach for Pipelining Sequences of Data-Dependent Loops

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of Producer-Consumer tasks. This paper proposes an approach to achieve pipelining execution of Producer-Consumer pairs of tasks in FPGA-based multi-core architectures. Our approach is able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the Producer-Consumer tasks. In order to improve performance, we propose a technique to optimize out-of-order Producer-Consumer pairs where the consumer uses more than once each data element produced, a behavior present in many applications (e.g., in image processing). All the schemes and optimizations proposed in this paper were evaluated with FPGA implementations. The experimental results show the feasibility of the approach in both in-order and out-of-order Producer-Consumer tasks. Furthermore, the results using our approach to task-level pipelining and a multi-core architecture reveal noticeable performance improvements for a number of benchmarks over a single core implementation without using task-level pipelining.