A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Spyder: a SURE (SUperscalar and REconfigurable) processor
The Journal of Supercomputing - Special issue on field programmable gate arrays
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory interfacing and instruction specification for reconfigurable processors
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
A C compiler for a processor with a reconfigurable functional unit
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
IEEE Transactions on Computers
Adapting software pipelining for reconfigurable computing
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
The Garp Architecture and C Compiler
Computer
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
The NAPA Adaptive Processing Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
ConCISe: A Compiler-Driven CPLD-Based Instruction Set Accelerator
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
The MOLEN Polymorphic Processor
IEEE Transactions on Computers
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Hi-index | 0.00 |
With reducing feature sizes, more transistors can be integrated on the chip. The increased transistor budget can be utilized to improve the instruction level parallelism (ILP) exploited from the processor. However, the transistors cannot be used to arbitrarily increase the processor width and size in the hope of exploiting better ILP. In this paper, we propose an architecture where the superscalar datapath is tightly coupled with a reconfigurable unit (RFU). The reconfiguration unit is configured to execute the traces of dynamic instructions that are frequently executed. To address the data dependency issues between the instructions in the superscalar and the RFU, we propose to execute the trace on the RFU with predicted values. When the trace instructions reach the issue queue in the superscalar, the predictions are validated. In this technique, performance improvement is obtained for correct prediction, whereas no performance degradation is incurred for mispredictions. With this architecture, we observe an average instructions per cycle (IPC) improvement of about 11% over the simulated SPEC 2000 benchmarks, using a very small last value data value predictor.