QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Authors:
Ganesh Venkatesh;Jack Sampson;Nathan Goulding-Hotta;Sravanthi Kota Venkata;Michael Bedford Taylor;Steven Swanson
Affiliations:
University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego
Venue:
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2011

Citing 21
Cited 13

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient method of computing static single assignment form

POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hardware/software instruction set configurability for system-on-chip processors

Proceedings of the 38th annual Design Automation Conference
PICO: Automatically Designing Custom Computers

Computer
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Application-specific instruction generation for configurable processor architectures

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ANTLRWorks: an ANTLR grammar development environment

Software—Practice & Experience
Warp Processing: Dynamic Translation of Binaries to FPGA Circuits

Computer
Evaluating design trade-offs in customizable processors

Proceedings of the 46th Annual Design Automation Conference
SD-VBS: The San Diego Vision Benchmark Suite

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future

IEEE Micro
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Efficient complex operators for irregular codes

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Toward Dark Silicon in Servers

IEEE Micro
UBCSAT: an implementation and experimentation environment for SLS algorithms for SAT and MAX-SAT

SAT'04 Proceedings of the 7th international conference on Theory and Applications of Satisfiability Testing

Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse

Proceedings of the 49th Annual Design Automation Conference
Designing for dark silicon: a methodological perspective on energy efficient systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CHARM: a composable heterogeneous accelerator-rich microprocessor

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A defect-tolerant accelerator for emerging high-performance applications

Proceedings of the 39th Annual International Symposium on Computer Architecture
Neural Acceleration for General-Purpose Approximate Programs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Continuous real-world inputs can open up alternative accelerator designs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Lighting the dark silicon by exploiting heterogeneity on future processors

Proceedings of the 50th Annual Design Automation Conference
HaDeS: architectural synthesis for heterogeneous dark silicon chip multi-processors

Proceedings of the 50th Annual Design Automation Conference
APE: accelerator processor extensions to optimize data-compute co-location

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA

ACM Transactions on Architecture and Code Optimization (TACO)
Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Meet the walkers: accelerating index traversals for in-memory databases

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transistor density continues to increase exponentially, but power dissipation per transistor is improving only slightly with each generation of Moore's law. Given the constant chip-level power budgets, this exponentially decreases the percentage of transistors that can switch at full frequency with each technology generation. Hence, while the transistor budget continues to increase exponentially, the power budget has become the dominant limiting factor in processor design. In this regime, utilizing transistors to design specialized cores that optimize energy-per-computation becomes an effective approach to improve system performance. To trade transistors for energy efficiency in a scalable manner, we propose Quasi-specific Cores, or QsCores, specialized processors capable of executing multiple general-purpose computations while providing an order of magnitude more energy efficiency than a general-purpose processor. The QsCores design flow is based on the insight that similar code patterns exist within and across applications. Our approach exploits these similar code patterns to ensure that a small set of specialized cores support a large number of commonly used computations. We evaluate QsCores's ability to target both a single application library (e.g., data structures) as well as a diverse workload consisting of applications selected from different domains (e.g., SPECINT, EEMBC, and Vision). Our results show that QsCores can provide 18.4 x better energy efficiency than general-purpose processors while reducing the amount of specialized logic required to support the workload by up to 66%.