Exploring the design space of LUT-based transparent accelerators

Authors:
Sami Yehia;Nathan Clark;Scott Mahlke;Krisztiàn Flautner
Affiliations:
ARM, Ltd., Cambridge, United Kingdom;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;ARM, Ltd., Cambridge, United Kingdom
Venue:
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2005

Citing 25
Cited 11

Computer arithmetic algorithms

Computer arithmetic algorithms
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
The effect of reconfigurable units in superscalar processors

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Instruction generation and regularity extraction for reconfigurable processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Processor reconfiguration through instruction-set metamorphosis

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
Synthesis of custom processors based on extensible platforms

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Instruction Pre-Processing in Trace Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A dynamic instruction set computer

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture

Scalable subgraph mapping for acyclic computation accelerators

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
An overview of reconfigurable hardware in embedded systems

EURASIP Journal on Embedded Systems
Design space exploration for a coarse grain accelerator

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Light-Weight Instruction Set Extensions for Bit-Sliced Cryptography

CHES '08 Proceeding sof the 10th international workshop on Cryptographic Hardware and Embedded Systems
Design space exploration for field programmable compressor trees

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Rapid design of area-efficient custom instructions for reconfigurable embedded processing

Journal of Systems Architecture: the EUROMICRO Journal
A combined analytical and simulation-based model for performance evaluation of a reconfigurable instruction set processor

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

IEICE - Transactions on Information and Systems
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Design-space exploration of resource-sharing solutions for custom instruction set extensions

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Bundled execution of recurring traces for energy-efficient general purpose processing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction set customization accelerates the performance of applications by compressing the length of critical dependence paths and reducing the demands on processor resources. With instruction set customization, specialized accelerators are added to a conventional processor to atomically execute dataflow subgraphs. Accelerators that are exploited without explicit changes to the instruction set architecture of the processor are said to be transparent. Transparent acceleration relies on a light-weight hardware engine to dynamically generate control signals for the accelerator, using subgraphs delineated by a compiler. The design of transparent subgraph accelerators is challenging, as critical subgraphs need to be supported efficiently while maintaining area and timing constraints. Additionally, more complex accelerators require more sophisticated control generation engines. These factors must be carefully balanced. In this work, we investigate the design of subgraph accelerators using configurable lookup table structures. These designs provide an effective paradigm to execute a wide range of subgraphs involving arithmetic and logic operations. We describe why lookup table designs are effective, how they fit into a transparent acceleration framework, and evaluate the effectiveness of a wide range of de-signs using both simulation and logic synthesis.