Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
For years, single-thread performance was the most dominant force driving processor development. In recent years, however, the poor scaling of single-thread super-scalar performance and power concerns coupled with the ever-increasing number of transistors available on chip has changed the focus from single-thread performance to thread-level parallelism running on multi-core designs. The trend is for these cores to be narrower with smaller windows. This dissertation addresses the question of how to maintain—and, ideally, improve—single-thread performance under such constraints. Mini-graph processing is a form of instruction fusion—the grouping of multiple operations into a single processing unit—that increases the instruction-per-cycle (IPC) throughput of dynamically scheduled superscalar processors in an efficient way. Mini-graphs are compiler-identified aggregates of multiple instructions that look and behave like singleton instructions at every pipeline stage, except for execute—there the constituent operations are retrieved and performed serially micro-code style. A mini-graph processor exploits instruction fusion to increase the efficiency of pipeline stages and structures that perform instruction book-keeping. This dissertation describes a mini-graph architecture and evaluates it using cycle-level simulation. A superscalar processor enhanced with mini-graphs can match the performance otherwise achieved with a wider, deeper superscalar processor. Experiments show that across four benchmark suites, the addition of mini-graph processing allows a dynamically scheduled 3-wide superscalar processor to match the IPC of a 4-wide superscalar machine.