An architecture framework for an adaptive extensible processor

Authors:
Hamid Noori;Farhad Mehdipour;Kazuaki Murakami;Koji Inoue;Morteza Saheb Zamani
Affiliations:
Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Research Institute for Information Technology, Computing and Communication Center, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;IT and Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran
Venue:
The Journal of Supercomputing
Year:
2008

Citing 25
Cited 4

A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Design and Implementation of the MorphoSys Reconfigurable ComputingProcessor

Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
The effect of reconfigurable units in superscalar processors

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Designing domain-specific processors

Proceedings of the ninth international symposium on Hardware/software codesign
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Instruction generation for hybrid reconfigurable systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Synthesis of custom processors based on extensible platforms

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
The Chimaera reconfigurable functional unit

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Application-specific instruction generation for configurable processor architectures

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
Scalable custom instructions identification for instruction-set extensible processors

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
The MOLEN Polymorphic Processor

IEEE Transactions on Computers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
An integrated temporal partitioning and mapping framework for handling custom instructions on a reconfigurable functional unit

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Custom instruction generation using temporal partitioning techniques for a reconfigurable functional unit

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing

Design space exploration for a coarse grain accelerator

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
A soft multi-core architecture for edge detection and data analysis of microarray images

Journal of Systems Architecture: the EUROMICRO Journal
Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

The Journal of Supercomputing
Architecture for transparent binary acceleration of loops with memory accesses

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

To improve the performance of embedded processors, an effective technique is collapsing critical computation subgraphs as application-specific instruction set extensions and executing them on custom functional units. The problem with this approach is the immense cost and the long times required to design a new processor for each application. As a solution to this issue, we propose an adaptive extensible processor in which custom instructions (CIs) are generated and added after chip-fabrication. To support this feature, custom functional units are replaced by a reconfigurable matrix of functional units (FUs). A systematic quantitative approach is used for determining the appropriate structure of the reconfigurable functional unit (RFU). We also introduce an integrated framework for generating mappable CIs on the RFU. Using this architecture, performance is improved by up to 1.33, with an average improvement of 1.16, compared to a 4-issue in-order RISC processor. By partitioning the configuration memory, detecting similar/subset CIs and merging small CIs, the size of the configuration memory is reduced by 40%.