Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

Authors:
Eric S. Chung;Peter A. Milder;James C. Hoe;Ken Mai
Affiliations:
-;-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 45
Cited 20

Reevaluating Amdahl's law

Communications of the ACM
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Heterogeneous computing machines and Amdahl's law

Parallel Computing
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only)

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
Power: A First-Class Architectural Design Constraint

Computer
RaPiD - Reconfigurable Pipelined Datapath

FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Getting Gigascale Chips: Challenges and Opportunities in Continuing Moore's Law

Queue - Power Management
Evaluating the Imagine Stream Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Cell Processor Architecture

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous Chip Multiprocessors

Computer
GPGPU: general purpose computation on graphics hardware

ACM SIGGRAPH 2004 Course Notes
Measuring the gap between FPGAs and ASICs

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

IEEE Computer Architecture Letters
Tartan: evaluating spatial computation for whole program execution

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Scaling, Power and the Future of CMOS

VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Amdahl's law revisited for single chip systems

International Journal of Parallel Programming
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Formal datapath representation and manipulation for implementing DSP transforms

Proceedings of the 45th annual Design Automation Conference
Efficient Embedded Computing

Computer
Amdahl's Law in the Multicore Era

Computer
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era

Computer
A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Accelerating Compute-Intensive Applications with GPUs and FPGAs

SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Performance and power of cache-based reconfigurable computing

Proceedings of the 36th annual international symposium on Computer architecture
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating SPICE Model-Evaluation using FPGAs

FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
On the Interplay of Parallelization, Program Performance, and Energy Consumption

IEEE Transactions on Parallel and Distributed Systems
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Accelerating Critical Section Execution with Asymmetric Multicore Architectures

IEEE Micro
Chip multiprocessors for server workloads

Chip multiprocessors for server workloads
Power7: IBM's Next-Generation Server Processor

IEEE Micro
Modeling critical sections in Amdahl's law and its implications for multicore design

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture

CoRAM: an in-fabric memory architecture for FPGA-based computing

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Evaluation of an accelerator architecture for speckle reducing anisotropic diffusion

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
CPU DB: recording microprocessor history

Communications of the ACM
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of power reduction in datacenters using heterogeneous chip multiprocessors

ACM SIGMETRICS Performance Evaluation Review
CPU DB: Recording Microprocessor History

Queue - Processors
Power Limitations and Dark Silicon Challenge the Future of Multicore

ACM Transactions on Computer Systems (TOCS)
A defect-tolerant accelerator for emerging high-performance applications

Proceedings of the 39th Annual International Symposium on Computer Architecture
Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Power challenges may end the multicore era

Communications of the ACM
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing

VLSI Design
A hardware unit for fast SAH-optimised BVH construction

ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
LINQits: big data on little clients

Proceedings of the 40th Annual International Symposium on Computer Architecture
Explicit Java control of low-power heterogeneous parallel processing in the ToucHMore project

Proceedings of the 11th International Workshop on Java Technologies for Real-time and Embedded Systems
On heterogeneous network-on-chip design based on constraint programming

Proceedings of the Sixth International Workshop on Network on Chip Architectures
The effect of communication and synchronization on Amdahl's law in multicore systems

Parallel Computing
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.03

Visualization

Abstract

To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also diminish given the scarcity of projected bandwidth in the future. To understand the relative merits between different approaches in the face of technology constraints, this work builds on prior modeling of heterogeneous multicores to support U-cores. Unlike prior models that trade performance, power, and area using well-known relationships between simple and complex processors, our model must consider the less-obvious relationships between conventional processors and a diverse set of U-cores. Further, our model supports speculation of future designs from scaling trends predicted by the ITRS road map. The predictive power of our model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs. Our results reinforce some current-day understandings of the potential and limitations of U-cores and also provides new insights on their relative merits.