New decompilation techniques for binary-level co-processor generation

Authors:
G. Stiff;F. Vahid
Affiliations:
Dept. of Comput. Sci. & Eng., California Univ., Riverside, CA, USA;Dept. of Comput. Sci. & Eng., California Univ., Riverside, CA, USA
Venue:
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Year:
2005

Citing 10
Cited 10

An ASIP design methodology for embedded systems

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Customized instruction-sets for embedded processors

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Xtensa: A Configurable and Extensible Processor

IEEE Micro
Hardware/software partitioning of software binaries

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Dynamic hardware/software partitioning: a first approach

Proceedings of the 40th annual Design Automation Conference
Automatic translation of software binaries onto FPGAs

Proceedings of the 41st annual Design Automation Conference
Input data reuse in compiling window operations onto reconfigurable hardware

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Techniques for synthesizing binaries to an advanced register/memory structure

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters

Warp Processors

Proceedings of the 41st annual Design Automation Conference
A code refinement methodology for performance-improved synthesis from C

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Low-power warp processor for power efficient high-performance embedded systems

Proceedings of the conference on Design, automation and test in Europe
Thread warping: a framework for dynamic synthesis of thread accelerators

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
An overview of a compiler for mapping software binaries to hardware

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Scalability and parallel execution of warp processing: dynamic hardware/software partitioning

International Journal of Parallel Programming
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Autonomous hardware/software partitioning and voltage/frequency scaling for low-power embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Thread Warping: Dynamic and Transparent Synthesis of Thread Accelerators

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing ASIPs (application-specific instruction-set processors) and compiler-based co-processor synthesis approaches meet the increasing performance requirements of embedded applications while consuming less power than high-performance gigahertz microprocessors. However, existing approaches place restrictions on software languages and compilers. Binary-level co-processor generation has previously been proposed as a complementary approach to reduce impact on tool restrictions, supporting all languages and compilers, at the cost of some decrease in performance. In a binary-level approach, decompilation recovers much of the high-level information, like loops and arrays, needed for effective synthesis, and in many cases yields hardware similar to that of a compiler-based approach. However, previous binary-level approaches have not considered the effects of software compiler optimizations on the resulting hardware. In this paper, we introduce two new decompilation techniques, strength promotion and loop rerolling, and show that they are necessary to synthesize an efficient custom hardware coprocessor from a binary in the presence of software compiler optimizations. In addition, unlike previous approaches, we show the robustness of binary-level co-processor generation by achieving order of magnitude speedups for binaries generated for three different instruction sets, MIPS, ARM, and MicroBlaze, using two different levels of compiler optimizations.