The ARISE approach for extending embedded processors with arbitrary hardware accelerators

Authors:
Nikolaos Vassiliadis;George Theodoridis;Spiridon Nikolaidis
Affiliations:
Electronics and Computers Section, Physics Department, Aristotle University of Thessaloniki, Thessaloniki, Greece;Electronics and Computers Section, Physics Department, Aristotle University of Thessaloniki, Thessaloniki, Greece;Electronics and Computers Section, Physics Department, Aristotle University of Thessaloniki, Thessaloniki, Greece
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2009

Citing 25
Cited 2

REMARC (abstract): reconfigurable multimedia array coprocessor

FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
A DAG-based design approach for reconfigurable VLIW processors

DATE '99 Proceedings of the conference on Design, automation and test in Europe
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Instruction generation and regularity extraction for reconfigurable processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Synthesis of custom processors based on extensible platforms

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
The chimaera reconfigurable functional unit

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
Automatic application-specific instruction-set extensions under microarchitectural constraints

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
The MOLEN Polymorphic Processor

IEEE Transactions on Computers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Instruction set extension with shadow registers for configurable processors

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Run-Time Reconfigurable Systems for Digital Signal Processing Applications: A Survey

Journal of VLSI Signal Processing Systems
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting pipelining to relax register-file port constraints of instruction-set extensions

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Automated Instruction-Set Extension of Embedded Processors with Application to MPEG-4 Video Encoding

ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The ArchC architecture description language and tools

International Journal of Parallel Programming
Exploiting forwarding to improve data bandwidth of instruction-set extensions

Proceedings of the 43rd annual Design Automation Conference
An automated development framework for a RISC processor with reconfigurable instruction set extensions

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

An Application Development Framework for ARISE Reconfigurable Processors

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

ARISE introduces a systematic approach for extending once an embedded processor to support thereafter the coupling of an arbitrary number of custom computing units (CCUs). A CCU can be a hardwired or a reconfigurable unit, which can be utilized following a tight and/or loose model of computation. By selecting the appropriate model of computation for each part of the application, the complete application space is considered for acceleration, resulting in significant performance improvements. Also, ARISE offers modularity and scalability and is not restricted by the opcode space and operands limitation problems that exist in such type of machines. To support these features we introduce a machine organization that allows the cooperation of a processor and a set of CCUs. To control the CCUs we extend once the instruction set of the processor with eight instructions. To efficiently incorporate these features to an embedded processor, we propose a micro-architecture implementation that minimizes the control and communication overhead between the processor and the CCUs. To evaluate our proposal, we extended a MIPS processor with the ARISE infrastructure and implemented it on a Xilinx field-programmable gate array (FPGA). Implementation results, demonstrate that the timing model of the processor is not affected. Also, we implemented a set of benchmarks on the ARISE evaluation machine. Performance results prove significant improvements and reduced communication overhead compared to a typical coprocessor approach.