Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Partitioning and pipelining for performance-constrained hardware/software systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Area and Throughput Trade-Offs in the Design of Pipelined Discrete Wavelet Transform Architectures
Proceedings of the conference on Design, Automation and Test in Europe - Volume 3
System-level scheduling on instruction cell based reconfigurable systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
The reconfigurable instruction cell array
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
Stream processing applications such as image signal processing demand high throughput. However, customers increasingly demand runtime flexibility in their designs, which cannot be provided by custom ASIC solutions. Currently, reconfigurable processors tend to offer insufficient throughput for widespread use in streaming applications. This paper demonstrates how structural-level pipelining techniques can be applied to rapidly dynamically reconfigurable computing architectures, in order to increase throughput. This is done by automatically inserting registers into the data path of performance critical code sections that have already been optimised into a single configuration context. A new algorithm is presented to choose the insertion point of pipeline stage registers in order to meet a specified throughput whilst minimising register resource usage. The paper then demonstrates a new approach where properties of dynamic reconfiguration can be utilised to perform the tasks of pipeline stage initialisation and flushing. The technique is demonstrated on a real-life application: the demosaic filter in a standard image signal processing pipe used in modern digital cameras, and can be seen to boost the throughput from 16MPixels/s to 51MPixels/s on an example reconfigurable processor.