Communications of the ACM
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Heterogeneous computing machines and Amdahl's law
Parallel Computing
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Getting Gigascale Chips: Challenges and Opportunities in Continuing Moore's Law
Queue - Power Management
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Cell Processor Architecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous Chip Multiprocessors
Computer
GPGPU: general purpose computation on graphics hardware
ACM SIGGRAPH 2004 Course Notes
Measuring the gap between FPGAs and ASICs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors
IEEE Computer Architecture Letters
Tartan: evaluating spatial computation for whole program execution
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Scaling, Power and the Future of CMOS
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
Amdahl's law revisited for single chip systems
International Journal of Parallel Programming
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Formal datapath representation and manipulation for implementing DSP transforms
Proceedings of the 45th annual Design Automation Conference
Computer
Amdahl's Law in the Multicore Era
Computer
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Accelerating Compute-Intensive Applications with GPUs and FPGAs
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Performance and power of cache-based reconfigurable computing
Proceedings of the 36th annual international symposium on Computer architecture
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating SPICE Model-Evaluation using FPGAs
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
On the Interplay of Parallelization, Program Performance, and Energy Consumption
IEEE Transactions on Parallel and Distributed Systems
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Chip multiprocessors for server workloads
Chip multiprocessors for server workloads
Modeling critical sections in Amdahl's law and its implications for multicore design
Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
CoRAM: an in-fabric memory architecture for FPGA-based computing
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Evaluation of an accelerator architecture for speckle reducing anisotropic diffusion
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
CPU DB: recording microprocessor history
Communications of the ACM
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of power reduction in datacenters using heterogeneous chip multiprocessors
ACM SIGMETRICS Performance Evaluation Review
CPU DB: Recording Microprocessor History
Queue - Processors
Power Limitations and Dark Silicon Challenge the Future of Multicore
ACM Transactions on Computer Systems (TOCS)
A defect-tolerant accelerator for emerging high-performance applications
Proceedings of the 39th Annual International Symposium on Computer Architecture
Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Power challenges may end the multicore era
Communications of the ACM
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
A hardware unit for fast SAH-optimised BVH construction
ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Explicit Java control of low-power heterogeneous parallel processing in the ToucHMore project
Proceedings of the 11th International Workshop on Java Technologies for Real-time and Embedded Systems
On heterogeneous network-on-chip design based on constraint programming
Proceedings of the Sixth International Workshop on Network on Chip Architectures
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.03 |
To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also diminish given the scarcity of projected bandwidth in the future. To understand the relative merits between different approaches in the face of technology constraints, this work builds on prior modeling of heterogeneous multicores to support U-cores. Unlike prior models that trade performance, power, and area using well-known relationships between simple and complex processors, our model must consider the less-obvious relationships between conventional processors and a diverse set of U-cores. Further, our model supports speculation of future designs from scaling trends predicted by the ITRS road map. The predictive power of our model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs. Our results reinforce some current-day understandings of the potential and limitations of U-cores and also provides new insights on their relative merits.