A hardware/software partitioner using a dynamically determined granularity
DAC '97 Proceedings of the 34th annual Design Automation Conference
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Energy-conscious HW/SW-partitioning of embedded systems: a case study on an MPEG-2 encoder
Proceedings of the 6th international workshop on Hardware/software codesign
A low power hardware/software partitioning approach for core-based embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Hardware-Software Cosynthesis for Microcontrollers
IEEE Design & Test
Proceedings of the 40th annual Design Automation Conference
Partitioning and Exploration Strategies in the TOSCA Co-Design Flow
CODES '96 Proceedings of the 4th International Workshop on Hardware/Software Co-Design
Frequent loop detection using efficient non-intrusive on-chip hardware
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A compiled accelerator for biological cell signaling simulations
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
ACM Transactions on Embedded Computing Systems (TECS)
Dynamic FPGA routing for just-in-time FPGA compilation
Proceedings of the 41st annual Design Automation Conference
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Hardware/software partitioning of software binaries: a case study of H.264 decode
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
New decompilation techniques for binary-level co-processor generation
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Proceedings of the 41st annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Concept-based partitioning for large multidomain multifunctional embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.01 |
Warp processors are a novel architecture capable of autonomously optimizing an executing application by dynamically re-implementing critical kernels within the software as custom hardware circuits in an on-chip FPGA. Previous research on warp processing focused on low-power embedded systems, incorporating a low-end ARM processor as the main software execution resource. We provide a thorough analysis of the scalability of warp processing by evaluating several possible warp processor implementations, from low-power to high-performance, and by evaluating the potential for parallel execution of the partitioned software and hardware. We further demonstrate that even considering a high-performance 1 GHz embedded processor, warp processing provides the equivalent performance of a 2.4GHz processor. By further enabling parallel execution between the processes and FPGA, the parallel warp processor execution provides the equivalent performance of a 3.2 GHz processor.