Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

  • Authors:
  • Nathan Clark;Manjunath Kudlur;Hyunchul Park;Scott Mahlke;Krisztian Flautner

  • Affiliations:
  • University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;ARM Ltd., UK

  • Venue:
  • Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new instructions that are executed on specialized function units. Collapsing the subgraphs simultaneously reduces the length of computation as well as the number of intermediate results stored in the register file. The main problem with this approach is that a new processor must be generated for each application domain. While new instructions can be designed automatically, there is a substantial amount of engineering cost incurred to verify and to implement the final custom processor. In this work, we propose a strategy to transparent customization of the core computation capabilities of the processor without changing its instruction set. A congurable array of function units is added to the baseline processor that enables the acceleration of a wide range of data flow subgraphs. To exploit the array, the microarchitecture performs subgraph identification at run-time, replacing them with new microcode instructions to configure and utilize the array. We compare the effectiveness of replacing subgraphs in the fill unit of a trace cache versus using a translation table during decode, and evaluate the tradeoffs between static and dynamic identification of subgraphs for instruction set customization.