Code generation using tree matching and dynamic programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Instruction selection using binate covering for code size optimization
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Code Generation for Embedded Processors
Code Generation for Embedded Processors
Instruction generation and regularity extraction for reconfigurable processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
High-Performance 3-1 Interlock Collapsing ALU's
IEEE Transactions on Computers
Synthesis of custom processors based on extensible platforms
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Instruction Pre-Processing in Trace Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
DISE: a programmable macro engine for customizing applications
Proceedings of the 30th annual international symposium on Computer architecture
Code generation and optimization for embedded digital signal processors
Code generation and optimization for embedded digital signal processors
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing embedded applications for instruction-set extensible processors
Proceedings of the 41st annual Design Automation Conference
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Exploring the design space of LUT-based transparent accelerators
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A new idiom recognition framework for exploiting hardware-assist instructions
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Scalable subgraph mapping for acyclic computation accelerators
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Code transformation strategies for extensible embedded processors
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
Proceedings of the International Symposium on Code Generation and Optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Journal of VLSI Signal Processing Systems
Proceedings of the conference on Design, automation and test in Europe
Improving instruction level parallelism through reconfigurable units in superscalar processors
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 13th international symposium on Low power electronics and design
An architecture framework for an adaptive extensible processor
The Journal of Supercomputing
Handling Control Data Flow Graphs for a Tightly Coupled Reconfigurable Accelerator
ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
ARISE Machines: Extending Processors with Hybrid Accelerators
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Rapid design of area-efficient custom instructions for reconfigurable embedded processing
Journal of Systems Architecture: the EUROMICRO Journal
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 6th ACM conference on Computing frontiers
AnySP: anytime anywhere anyway signal processing
Proceedings of the 36th annual international symposium on Computer architecture
An Application Development Framework for ARISE Reconfigurable Processors
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Runtime Adaptive Extensible Embedded Processors -- A Survey
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
CGRA express: accelerating execution using dynamic operation fusion
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamically utilizing computation accelerators for extensible processors in a software approach
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the 20th symposium on Great lakes symposium on VLSI
HiPC'08 Proceedings of the 15th international conference on High performance computing
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
KAHRISMA: a novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture
Proceedings of the Conference on Design, Automation and Test in Europe
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scientific Application Demands on a Reconfigurable Functional Unit Interface
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The ARISE approach for extending embedded processors with arbitrary hardware accelerators
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The Journal of Supercomputing
Mixing static and dynamic strategies for high performance and low area reconfigurable systems
International Journal of High Performance Systems Architecture
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy efficient special instruction support in an embedded processor with compact isa
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Architecture for transparent binary acceleration of loops with memory accesses
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
ACM Transactions on Architecture and Code Optimization (TACO)
Idiom recognition framework using topological embedding
ACM Transactions on Architecture and Code Optimization (TACO)
A just-in-time customizable processor
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense. To overcome this roadblock, this paper proposes a flexible architectural framework to transparently integrate custom instructions into a general-purpose processor. Hardware accelerators are added to the processor to execute the collapsed subgraphs. A simple microarchitectural interface is provided to support a plug-and-play model for integrating a wide range of accelerators into a pre-designed and verified processor core. The accelerators are exploited using an approach of static identification and dynamic realization. The compiler is responsible for identifying profitable subgraphs, while the hardware handles discovery, mapping, and execution of compatible subgraphs. This paper presents the design of a plug-and-play transparent accelerator system and evaluates the cost/performance implications of the design.