Adding Limited Reconfigurability to Superscalar Processors

Authors:
Marc Epalza;Paolo Ienne;Daniel Mlynek
Affiliations:
Signal Processing Institute;Processor Architecture Lab;Signal Processing Institute
Venue:
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Year:
2004

Citing 11
Cited 2

Computer arithmetic systems: algorithms, architecture and implementation

Computer arithmetic systems: algorithms, architecture and implementation
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor

Digital Technical Journal - Special 10th anniversary issue
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimal Circuits for Parallel Multipliers

IEEE Transactions on Computers
Computer arithmetic: algorithms and hardware designs

Computer arithmetic: algorithms and hardware designs
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
A reconfigurable signal processing IC with embedded FPGA and multi-port flash memory

Proceedings of the 40th annual Design Automation Conference
Itanium 2 Processor Microarchitecture

IEEE Micro
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
System Design Using Kahn Process Networks: The Compaan/Laura Approach

Proceedings of the conference on Design, automation and test in Europe - Volume 1

Evaluation of the field-programmable cache: performance and energy consumption

Proceedings of the 3rd conference on Computing frontiers
Combining multicore and reconfigurable instruction set extensions

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

When adding reconfigurability to custom hardware, one must take great care that the reduction in speed due to the reconfigurable logic should not cancel out the gains obtained by reconfiguration. These gains are greatest in very specific and computation-intensive applications, and lessen as the applications become more general and heterogeneous. In the case of superscalar processors, this leads to limiting the amount of reconfigurability to precise changes in existing functional units instead of adding a fully configurable functional unit. We present a detailed study of the modifications necessary in a superscalar processor to allow an FPU to be dynamically reconfigured as several ALUs with a minimal increase in the latency of these functional units. The timing of the FPU's multiplier tree and the decision about reconfiguration are exposed. As there is more than one simple unit involved, this decision is more global than a cycle-by-cycle reconfiguration and must be made for a longer period of time. We discuss possible policies for the dynamic reconfiguration decisions. The results show interesting gains of up to 56% in the best cases, and average gains of 10%, on typical architectures over a wide range of applications.