MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
IEEE Transactions on Computers
Math toolkit for real-time programming
Math toolkit for real-time programming
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Storage Management Programmable Process
Storage Management Programmable Process
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The Garp Architecture and C Compiler
Computer
Synthesis of custom processors based on extensible platforms
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
A Processor-Coprocessor Architecture for High End Video Applications
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Application-specific instruction generation for configurable processor architectures
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Co-Processor Synthesis: A New Methodology for Embedded Software Acceleration
Proceedings of the conference on Design, automation and test in Europe - Volume 1
INSIDE: INstruction Selection/Identification & Design Exploration for Extensible Processors
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture
IEEE Transactions on Computers
Closing the power gap between ASIC and custom: an ASIC perspective
Proceedings of the 42nd annual Design Automation Conference
Hardware/software partitioning of software binaries: a case study of H.264 decode
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Novel architecture for loop acceleration: a case study
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Measuring the gap between FPGAs and ASICs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Performance optimization using template mapping for datapath-intensive high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A high-performance data path for synthesizing DSP kernels
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
The speedups and the energy reductions achieved in a generic single-chip microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates computational intensive kernel sections thereby increasing the overall performance. The authors have previously introduced the data-path which is composed by flexible computational components (FCCs). These components can realize any two-level sequence of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. The overall application speedups of eleven real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. These speedups are close to theoretical bounds and range from 1.78 to 5.84, having an average value of 3.04, while the overhead in circuit area is small. The energy savings range from 41 to 74%, while the reduction in the application energy-delay product has an average value of 80%. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance, consumes less energy and has smaller area-time products for the generated data-paths. Additionally, the FCC data-path achieves better performance in accelerating kernels relative to a VLIW DSP core.