An efficient method of computing static single assignment form
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Hardware/software instruction set configurability for system-on-chip processors
Proceedings of the 38th annual Design Automation Conference
CryptoManiac: a fast flexible architecture for secure communication
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
IEEE Micro
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Automatic Architectural Synthesis of VLIW and EPIC Processors
Proceedings of the 12th international symposium on System synthesis
Energy characterization of a tiled architecture processor with on-chip networks
Proceedings of the 2003 international symposium on Low power electronics and design
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
The Cell Processor Architecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Tartan: evaluating spatial computation for whole program execution
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficient architectures through application clustering and architectural heterogeneity
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
ACM Transactions on Computer Systems (TOCS)
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
An energy-efficient patchable accelerator for post-silicon engineering changes
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Reflex: using low-power processors in smartphones without knowing them
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Idempotent processor architecture
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Architecture support for accelerator-rich CMPs
Proceedings of the 49th Annual Design Automation Conference
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse
Proceedings of the 49th Annual Design Automation Conference
Power Limitations and Dark Silicon Challenge the Future of Multicore
ACM Transactions on Computer Systems (TOCS)
The yin and yang of power and performance for asymmetric hardware and managed software
Proceedings of the 39th Annual International Symposium on Computer Architecture
Harmony: collection and analysis of parallel block vectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Operating systems should manage accelerators
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
Power challenges may end the multicore era
Communications of the ACM
Location, location, location: the role of spatial locality in asymptotic energy minimization
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Computational sprinting on a hardware/software testbed
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A hardware unit for fast SAH-optimised BVH construction
ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
A general constraint-centric scheduling framework for spatial architectures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Active disk meets flash: a case for intelligent SSDs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Exploring hardware overprovisioning in power-constrained, high performance computing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Flicker: a dynamically adaptive architecture for power limited multicore systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Convolution engine: balancing efficiency & flexibility in specialized computing
Proceedings of the 40th Annual International Symposium on Computer Architecture
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
The ITRS design technology and system drivers roadmap: process and status
Proceedings of the 50th Annual Design Automation Conference
Lighting the dark silicon by exploiting heterogeneity on future processors
Proceedings of the 50th Annual Design Automation Conference
Systematic evaluation of workload clustering for extremely energy-efficient architectures
ACM SIGARCH Computer Architecture News
ACM Transactions on Architecture and Code Optimization (TACO)
ThermOS: system support for dynamic thermal management of chip multi-processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Exposing ILP in custom hardware with a dataflow compiler IR
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
OmpSs@Zynq all-programmable SoC ecosystem
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating an application domain with specialized functional units
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Bitcoin and the age of bespoke silicon
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A generalized software framework for accurate and efficient management of performance goals
Proceedings of the Eleventh ACM International Conference on Embedded Software
Optimization of interconnects between accelerators and shared memories in dark silicon
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.02 |
Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.