Mini-graph processing

Authors:
Amir Roth;Anne Weinberger Bracy
Affiliations:
University of Pennsylvania;University of Pennsylvania
Venue:
Mini-graph processing
Year:
2008

Citing 0
Cited 1

Comparing FPGA vs. custom cmos and the impact on processor microarchitecture

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

For years, single-thread performance was the most dominant force driving processor development. In recent years, however, the poor scaling of single-thread super-scalar performance and power concerns coupled with the ever-increasing number of transistors available on chip has changed the focus from single-thread performance to thread-level parallelism running on multi-core designs. The trend is for these cores to be narrower with smaller windows. This dissertation addresses the question of how to maintain—and, ideally, improve—single-thread performance under such constraints. Mini-graph processing is a form of instruction fusion—the grouping of multiple operations into a single processing unit—that increases the instruction-per-cycle (IPC) throughput of dynamically scheduled superscalar processors in an efficient way. Mini-graphs are compiler-identified aggregates of multiple instructions that look and behave like singleton instructions at every pipeline stage, except for execute—there the constituent operations are retrieved and performed serially micro-code style. A mini-graph processor exploits instruction fusion to increase the efficiency of pipeline stages and structures that perform instruction book-keeping. This dissertation describes a mini-graph architecture and evaluates it using cycle-level simulation. A superscalar processor enhanced with mini-graphs can match the performance otherwise achieved with a wider, deeper superscalar processor. Experiments show that across four benchmark suites, the addition of mini-graph processing allows a dynamically scheduled 3-wide superscalar processor to match the IPC of a 4-wide superscalar machine.