PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic compilation to a coarse-grained reconfigurable system-opn-chip
ACM Transactions on Embedded Computing Systems (TECS)
Application-specific instruction generation for configurable processor architectures
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
The chimaera reconfigurable functional unit
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
INSIDE: INstruction Selection/Identification & Design Exploration for Extensible Processors
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Scalable Processor Instruction Set Extension
IEEE Design & Test
Performance optimization using template mapping for datapath-intensive high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.01 |
This paper presents the performance improvements by coupling a high-performance coarse-grained reconfigurable data-path with a microprocessor in a generic platform. It is composed by computational units able to realize complex operations which aid in improving the performance of time critical application parts, called kernels. A design flow is proposed for mapping software descriptions to the microprocessor system. Analytical study is performed by mapping eight real-life applications on three different instances of the system. Significant overall application speedups, relative to a software-only solution, ranging from 1.76 to 3.88 are reported being close to theoretical speedup bounds.