Comparing FPGA vs. custom cmos and the impact on processor microarchitecture

Authors:
Henry Wong;Vaughn Betz;Jonathan Rose
Affiliations:
University of Toronto, Toronto, ON, Canada;Altera Corp., Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada
Venue:
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2011

Citing 26
Cited 5

Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Interlock collapsing ALU for increased instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
A high performance 32-bit ALU for programmable logic

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Stratix II logic and routing architecture

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Multiplexer restructuring for FPGA implementation cost reduction

Proceedings of the 42nd annual Design Automation Conference
Closing the POWER Gap between ASIC & Custom: Tools and Techniques for Low Power Design

Closing the POWER Gap between ASIC & Custom: Tools and Techniques for Low Power Design
The microarchitecture of FPGA-based soft processors

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Exploring CAM Design For Network Processing Using FPGA Technology

AICT-ICIW '06 Proceedings of the Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services
SEED: scalable, efficient enforcement of dependences

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
An FPGA-based Pentium® in a complete desktop system

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
IBM POWER6 SRAM arrays

IBM Journal of Research and Development
Intel® atom™ processor core made FPGA-synthesizable

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Godson-3: A Scalable Multicore RISC Processor with x86 Emulation

IEEE Micro
Mini-graph processing

Mini-graph processing
A 270ps 20mW 108-bit End-around Carry Adder for Multiply-Add Fused Floating Point Unit

Journal of Signal Processing Systems
Intel nehalem processor core made FPGA synthesizable

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Efficient multi-ported memories for FPGAs

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Measuring the Gap Between FPGAs and ASICs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Making wide-issue VLIW processors viable on FPGAs

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Multi-ported memories for FPGAs via XOR

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Quantifying the cost and benefit of latency insensitive communication on FPGAs

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Optimizing effective interconnect capacitance for FPGA power reduction

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

As soft processors are increasingly used in diverse applications, there is a need to evolve their microarchitectures in a way that suits the FPGA implementation substrate. This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates. We then use the results of these comparisons to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on custom CMOS. We find that the ratios of the area required by an FPGA to that of custom CMOS for different building blocks varies significantly more than the speed ratios. As area is often a key design constraint in FPGA circuits, area ratios have the most impact on microarchitecture choices. Complete processor cores have area ratios of 17-27x and delay ratios of 18-26x. Building blocks that have dedicated hardware support on FPGAs such as SRAMs, adders, and multipliers are particularly area-efficient (2-7x area ratio), while multiplexers and CAMs are particularly area-inefficient (100x area ratio), leading to cheaper ALUs, larger caches of low associativity, and more expensive bypass networks than on similar hard processors. We also find that a low delay ratio for pipeline latches (12-19x) suggests soft processors should have pipeline depths 20% greater than hard processors of similar complexity.