MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
IEEE Transactions on Computers
Math toolkit for real-time programming
Math toolkit for real-time programming
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The Garp Architecture and C Compiler
Computer
A Reconfigurable Functional Unit for TriMedia/CPU64. A Case Study
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Synthesis of custom processors based on extensible platforms
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
A Processor-Coprocessor Architecture for High End Video Applications
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Application-specific instruction generation for configurable processor architectures
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Co-Processor Synthesis: A New Methodology for Embedded Software Acceleration
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
INSIDE: INstruction Selection/Identification & Design Exploration for Extensible Processors
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture
IEEE Transactions on Computers
Hardware/software partitioning of software binaries: a case study of H.264 decode
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Novel architecture for loop acceleration: a case study
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Measuring the gap between FPGAs and ASICs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Performance optimization using template mapping for datapath-intensive high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A high-performance data path for synthesizing DSP kernels
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hierarchical reconfigurable computing arrays for efficient CGRA-based embedded systems
Proceedings of the 46th Annual Design Automation Conference
Hi-index | 0.00 |
This article presents the speedups achieved in a generic single-chip microprocessor system by employing a high-performance datapath. The datapath acts as a coprocessor that accelerates computational-intensive kernel sections thereby increasing the overall performance. We have previously introduced the datapath which is composed of Flexible Computational Components (FCCs). These components can realize any two-level template of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical study in respect to the type of the custom datapath and to the microprocessor architecture is performed. The overall application speedups of several real-life applications relative to the software execution on the microprocessor are estimated using the design flow. These speedups range from 1.75 to 5.84, with an average value of 3.04, while the overhead in circuit area is small. The design flow achieved the acceleration of the applications near to theoretical speedup bounds. A comparison with another high-performance datapath showed that the proposed coprocessor achieves smaller area-time products by an average of 23% for the generated datapaths. Additionally, the FCC coprocessor achieves better performance in accelerating kernels relative to software-programmable DSP cores.