Mini-graph processing

  • Authors:
  • Amir Roth;Anne Weinberger Bracy

  • Affiliations:
  • University of Pennsylvania;University of Pennsylvania

  • Venue:
  • Mini-graph processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

For years, single-thread performance was the most dominant force driving processor development. In recent years, however, the poor scaling of single-thread super-scalar performance and power concerns coupled with the ever-increasing number of transistors available on chip has changed the focus from single-thread performance to thread-level parallelism running on multi-core designs. The trend is for these cores to be narrower with smaller windows. This dissertation addresses the question of how to maintain—and, ideally, improve—single-thread performance under such constraints. Mini-graph processing is a form of instruction fusion—the grouping of multiple operations into a single processing unit—that increases the instruction-per-cycle (IPC) throughput of dynamically scheduled superscalar processors in an efficient way. Mini-graphs are compiler-identified aggregates of multiple instructions that look and behave like singleton instructions at every pipeline stage, except for execute—there the constituent operations are retrieved and performed serially micro-code style. A mini-graph processor exploits instruction fusion to increase the efficiency of pipeline stages and structures that perform instruction book-keeping. This dissertation describes a mini-graph architecture and evaluates it using cycle-level simulation. A superscalar processor enhanced with mini-graphs can match the performance otherwise achieved with a wider, deeper superscalar processor. Experiments show that across four benchmark suites, the addition of mini-graph processing allows a dynamically scheduled 3-wide superscalar processor to match the IPC of a 4-wide superscalar machine.