Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
A hardware/software partitioner using a dynamically determined granularity
DAC '97 Proceedings of the 34th annual Design Automation Conference
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
The design of an SRAM-based field-programmable gate array—part I: architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Speed and area tradeoffs in cluster-based FPGA architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
New methods to color the vertices of a graph
Communications of the ACM
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
Architecture and CAD for Deep-Submicron FPGAs
Architecture and CAD for Deep-Submicron FPGAs
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Hardware-Software Cosynthesis for Microcontrollers
IEEE Design & Test
CC '96 Proceedings of the 6th International Conference on Compiler Construction
VPR: A new packing, placement and routing tool for FPGA research
FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
Hardware/software partitioning of software binaries
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Dynamic hardware/software partitioning: a first approach
Proceedings of the 40th annual Design Automation Conference
Proceedings of the 40th annual Design Automation Conference
Partitioning and Exploration Strategies in the TOSCA Co-Design Flow
CODES '96 Proceedings of the 4th International Workshop on Hardware/Software Co-Design
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Assembly to High-Level Language Translation
ICSM '98 Proceedings of the International Conference on Software Maintenance
A Programmable Co-processor for Profiling
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Frequent loop detection using efficient non-intrusive on-chip hardware
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A compiled accelerator for biological cell signaling simulations
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A fast on-chip profiler memory using a pipelined binary tree
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Automatic translation of software binaries onto FPGAs
Proceedings of the 41st annual Design Automation Conference
Dynamic FPGA routing for just-in-time FPGA compilation
Proceedings of the 41st annual Design Automation Conference
Programming models and architectures for FPGA platforms
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Hardware/software partitioning of software binaries: a case study of H.264 decode
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
New decompilation techniques for binary-level co-processor generation
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Low-power warp processor for power efficient high-performance embedded systems
Proceedings of the conference on Design, automation and test in Europe
Thread warping: a framework for dynamic synthesis of thread accelerators
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Run-time instruction set selection in a transmutable embedded processor
Proceedings of the 45th annual Design Automation Conference
Run-time system for an extensible embedded processor with dynamic instruction set
Proceedings of the conference on Design, automation and test in Europe
Transparent reconfigurable acceleration for heterogeneous embedded applications
Proceedings of the conference on Design, automation and test in Europe
Run-Time Adaptable Architectures for Heterogeneous Behavior Embedded Systems
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Non-intrusive dynamic application profiler for detailed loop execution characterization
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Design and implementation of a MicroBlaze-based warp processor
ACM Transactions on Embedded Computing Systems (TECS)
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Scalability and parallel execution of warp processing: dynamic hardware/software partitioning
International Journal of Parallel Programming
Dynamically Adapted Low Power ASIPs
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Strategies for dynamic memory allocation in hybrid architectures
Proceedings of the 6th ACM conference on Computing frontiers
Customized kernel execution on reconfigurable hardware for embedded applications
Microprocessors & Microsystems
Efficient memory management for hardware accelerated Java Virtual Machines
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Runtime Adaptive Extensible Embedded Processors -- A Survey
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Non-intrusive dynamic application profiling for multitasked applications
Proceedings of the 46th Annual Design Automation Conference
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hardware JIT compilation for off-the-shelf dynamically reconfigurable FPGAs
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
KAHRISMA: a novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Reliability- and process variation-aware placement for FPGAs
Proceedings of the Conference on Design, Automation and Test in Europe
Automatically mapping applications to a self-reconfiguring platform
Proceedings of the Conference on Design, Automation and Test in Europe
Toward a runtime system for reconfigurable computers: a virtualization approach
Proceedings of the Conference on Design, Automation and Test in Europe
Binary acceleration using coarse-grained reconfigurable architecture
ACM SIGARCH Computer Architecture News
Efficient hardware-based nonintrusive dynamic application profiling
ACM Transactions on Embedded Computing Systems (TECS)
Logarithmic-Time FPGA Bitstream Analysis: A Step Towards JIT Hardware Compilation
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Thread Warping: Dynamic and Transparent Synthesis of Thread Accelerators
ACM Transactions on Design Automation of Electronic Systems (TODAES)
CReAMS: an embedded multiprocessor platform
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Dynamic data folding with parameterizable FPGA configurations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An exploration of mechanisms for dynamic cryptographic instruction set extension
CHES'11 Proceedings of the 13th international conference on Cryptographic hardware and embedded systems
Boosting single thread performance in mobile processors via reconfigurable acceleration
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Improving communication latency with the write-only architecture
Journal of Parallel and Distributed Computing
Parallel partitioning for distributed systems using sequential assignment
Journal of Parallel and Distributed Computing
ACM Transactions on Embedded Computing Systems (TECS)
Towards a multiple-ISA embedded system
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the Conference on Design, Automation and Test in Europe
Partial online-synthesis for mixed-grained reconfigurable architectures
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
MORP: makespan optimization for processors with an embedded reconfigurable fabric
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
A just-in-time customizable processor
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that also improve software using an FPGA but do so using a special compiler, a warp processor achieves these improvements completely transparently and operates from a standard binary. A warp processor dynamically detects the binary's critical regions, reimplements those regions as a custom hardware circuit in the FPGA, and replaces the software region by a call to the new hardware implementation of that region. While not all benchmarks can be improved using warp processing, many can, and the improvements are dramatically better than those achievable by more traditional architecture improvements. The hardest part of warp processing is that of dynamically reimplementing code regions on an FPGA, requiring partitioning, decompilation, synthesis, placement, and routing tools, all having to execute with minimal computation time and data memory so as to coexist on chip with the main processor. We describe the results of developing our warp processor. We developed a custom FPGA fabric specifically designed to enable lean place and route tools, and we developed extremely fast and efficient versions of partitioning, decompilation, synthesis, technology mapping, placement, and routing. Warp processors achieve overall application speedups of 6.3X with energy savings of 66&percent; across a set of embedded benchmark applications. We further show that our tools utilize acceptably small amounts of computation and memory which are far less than traditional tools. Our work illustrates the feasibility and potential of warp processing, and we can foresee the possibility of warp processing becoming a feature in a variety of computing domains, including desktop, server, and embedded applications.