Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
IEEE Transactions on Computers
Machine organization of the IBM RISC System/6000 processor
IBM Journal of Research and Development
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PRISC: programmable reduced instruction set computers
PRISC: programmable reduced instruction set computers
Communications of the ACM
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Low power and high performance design challenges in future technologies
GLSVLSI '00 Proceedings of the 10th Great Lakes symposium on VLSI
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
High-Performance 3-1 Interlock Collapsing ALU's
IEEE Transactions on Computers
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Specifying and Compiling Applications for RaPiD
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
The NAPA Adaptive Processing Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A dynamic instruction set computer
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
A C compiler for a processor with a reconfigurable functional unit
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
The effect of reconfigurable units in superscalar processors
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Automated design of finite state machine predictors for customized processors
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
NanoFabrics: spatial computing using molecular electronics
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Datapath merging and interconnection sharing for reconfigurable architectures
Proceedings of the 15th international symposium on System Synthesis
Reconfigurable media processing
Parallel Computing - Parallel computing in image and video processing
Pattern Recognition Tool to Detect Reconfigurable Patterns in MPEG4 Video Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Quantitative Understanding of the Performance of Reconfigurable Coprocessors
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Global resource sharing for synthesis of control data flow graphs on FPGAs
Proceedings of the 40th annual Design Automation Conference
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Automatic compilation to a coarse-grained reconfigurable system-opn-chip
ACM Transactions on Embedded Computing Systems (TECS)
A scalable wide-issue clustered VLIW with a reconfigurable interconnect
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Performance of reconfigurable architectures for image-processing applications
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
A reconfigurable unit for a clustered programmable-reconfigurable processor
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Proceedings of the 1st conference on Computing frontiers
Automatic translation of software binaries onto FPGAs
Proceedings of the 41st annual Design Automation Conference
Characterizing embedded applications for instruction-set extensible processors
Proceedings of the 41st annual Design Automation Conference
Introduction of local memory elements in instruction set extensions
Proceedings of the 41st annual Design Automation Conference
Spinach: a liberty-based simulator for programmable network interface architectures
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Proceedings of the 31st annual international symposium on Computer architecture
Automatic application-specific instruction-set extensions under microarchitectural constraints
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Low-Power High-Performance Reconfigurable Computing Cache Architectures
IEEE Transactions on Computers
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Instruction set extension with shadow registers for configurable processors
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
IEEE Transactions on Computers
Exploring the design space of LUT-based transparent accelerators
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Proceedings of the 2006 ACM symposium on Applied computing
Tartan: evaluating spatial computation for whole program execution
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Scalable subgraph mapping for acyclic computation accelerators
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Serialization-Aware Mini-Graphs: Performance with Fewer Resources
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor
Microprocessors & Microsystems
Architecture and compiler optimizations for data bandwidth improvement in configurable processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Architecture and Development Flow of the S5 Software Configurable Processor
Journal of VLSI Signal Processing Systems
RoSA: a reconfigurable stream-based architecture
Proceedings of the 20th annual conference on Integrated circuits and systems design
Embedded real-time architecture for level-set-based active contours
EURASIP Journal on Applied Signal Processing
Mapping streaming architectures on reconfigurable platforms
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An overview of a compiler for mapping software binaries to hardware
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Superscalar architecture design for high performance DSP operations
Microprocessors & Microsystems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEICE - Transactions on Information and Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
FlexCore: Utilizing Exposed Datapath Control for Efficient Computing
Journal of Signal Processing Systems
Runtime Adaptive Extensible Embedded Processors -- A Survey
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Hardware parallelism vs. software parallelism
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
KAHRISMA: a novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Cross-architectural design space exploration tool for reconfigurable processors
Proceedings of the Conference on Design, Automation and Test in Europe
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion
Proceedings of the 8th ACM International Conference on Computing Frontiers
RCMP: a reconfigurable chip-multiprocessor architecture
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Generation of control and data flow graphs from scheduled and pipelined assembly code
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Selective instruction set muting for energy-aware adaptive processors
Proceedings of the International Conference on Computer-Aided Design
Rethinking FPGAs: elude the flexibility excess of LUTs with and-inverter cones
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
International Journal of Reconfigurable Computing - Special issue on Selected Papers from the International Conference on Reconfigurable Computing and FPGAs (ReConFig'10)
Mixing static and dynamic strategies for high performance and low area reconfigurable systems
International Journal of High Performance Systems Architecture
A coarse-grained reconfigurable architecture with compilation for high performance
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
Architecture support for custom instructions with memory operations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Triggered instructions: a control paradigm for spatially-programmed architectures
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.01 |
Reconfigurable hardware has the potential for significant performance improvements by providing support for application-specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit (RFU) into the pipeline of an aggressive, dynamically-scheduled superscalar processor. Chimaera is capable of performing 9-input/1-output operations on integer data. We discuss the Chimaera C compiler that automatically maps computations for execution in the RFU. Chimaera is capable of: (1) collapsing a set of instructions into RFU operations, (2) converting control-flow into RFU operations, and (3) supporting a more powerful fine-grain data-parallel model than that supported by current multimedia extension instruction sets (for integer operations). Using a set of multimedia and communication applications we show that even with simple optimizations, the Chimaera C compiler is able to map 22% of all instructions to the RFU on the average. A variety of computations are mapped into RFU operations ranging from as simple as add/sub-shift pairs to operations of more than 10 instructions including several branches. Timing experiments demonstrate that for a 4-way out-of-order superscalar processor Chimaera results in average performance improvements of 21%, assuming a very aggressive core processor design (most pessimistic RFU latency model) and communication overheads from and to the RFU.