Optimization of VDD and VTH for low-power and high speed applications
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Corollaries to Amdahl's Law for Energy
IEEE Computer Architecture Letters
Amdahl's Law in the Multicore Era
Computer
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
An Asymptotic Performance/Energy Analysis and Optimization of Multi-core Architectures
ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
Over-provisioned multicore systems
Over-provisioned multicore systems
Many-Core vs. Many-Thread Machines: Stay Away From the Valley
IEEE Computer Architecture Letters
Understanding PARSEC performance on contemporary CMPs
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Looking back on the language and hardware revolutions: measured power, performance, and scaling
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Understanding sources of ineffciency in general-purpose chips
Communications of the ACM
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Chameleon: operating system support for dynamic processors
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Architecture support for disciplined approximate programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Idempotent processor architecture
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Looking back and looking forward: power, performance, and upheaval
Communications of the ACM
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse
Proceedings of the 49th Annual Design Automation Conference
Assessing the performance limits of parallelized near-threshold computing
Proceedings of the 49th Annual Design Automation Conference
Near-threshold voltage (NTV) design: opportunities and challenges
Proceedings of the 49th Annual Design Automation Conference
Amdahl's law for predicting the future of multicores considered harmful
ACM SIGARCH Computer Architecture News
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Design benchmarking to 7nm with FinFET predictive technology models
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Designing for dark silicon: a methodological perspective on energy efficient systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Evaluation of voltage stacking for near-threshold multicore computing
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A defect-tolerant accelerator for emerging high-performance applications
Proceedings of the 39th Annual International Symposium on Computer Architecture
Operating systems should manage accelerators
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Hardware acceleration in the IBM PowerEN processor: architecture and performance
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Potentia est scientia: security and privacy implications of energy-proportional computing
HotSec'12 Proceedings of the 7th USENIX conference on Hot Topics in Security
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Don't burn your mobile!: safe computational re-sprinting via model predictive control
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
Power efficiency for software algorithms running on graphics processors
EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
Exploring multi-threaded Java application performance on multicore hardware
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
The implications of shared data synchronization techniques on multi-core energy efficiency
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Energy consumption modeling for hybrid computing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Location, location, location: the role of spatial locality in asymptotic energy minimization
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Low power cache architectures with hybrid approach of filtering unnecessary way accesses
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Computational sprinting on a hardware/software testbed
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting Coherence Communication by Tracking Synchronization Points at Run Time
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A hardware unit for fast SAH-optimised BVH construction
ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
A general constraint-centric scheduling framework for spatial architectures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Software-defined massive multicore networking via freespace optical interconnect
Proceedings of the ACM International Conference on Computing Frontiers
Weak heterogeneity as a way of adapting multicores to real workloads
Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems
Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors
Proceedings of the Conference on Design, Automation and Test in Europe
Future of GPGPU micro-architectural parameters
Proceedings of the Conference on Design, Automation and Test in Europe
Resource efficient computing for warehouse-scale datacenters
Proceedings of the Conference on Design, Automation and Test in Europe
Mitigating dark-silicon problems using superlattice-based thermoelectric coolers
Proceedings of the Conference on Design, Automation and Test in Europe
D-MRAM cache: enhancing energy efficiency with 3T-1MTJ DRAM/MRAM hybrid memory
Proceedings of the Conference on Design, Automation and Test in Europe
Continuous real-world inputs can open up alternative accelerator designs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Flicker: a dynamically adaptive architecture for power limited multicore systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Runtime resource allocation for software pipelines
Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
The ITRS design technology and system drivers roadmap: process and status
Proceedings of the 50th Annual Design Automation Conference
Lighting the dark silicon by exploiting heterogeneity on future processors
Proceedings of the 50th Annual Design Automation Conference
Power gating applied to MP-SoCs for standby-mode power management
Proceedings of the 50th Annual Design Automation Conference
HaDeS: architectural synthesis for heterogeneous dark silicon chip multi-processors
Proceedings of the 50th Annual Design Automation Conference
Hierarchical power management for asymmetric multi-core in dark silicon era
Proceedings of the 50th Annual Design Automation Conference
Systematic evaluation of workload clustering for extremely energy-efficient architectures
ACM SIGARCH Computer Architecture News
ACM Transactions on Architecture and Code Optimization (TACO)
Coordinated power-performance optimization in manycores
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
UWB microwave imaging for breast cancer detection: Many-core, GPU, or FPGA?
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hyper-switch: a scalable software virtual switching architecture
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Modeling the effects of DFS on power consumption in hybrid chip multiprocessors
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Asymmetric scaling on network packet processors in the dark silicon era
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Divergence-aware warp scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Meet the walkers: accelerating index traversals for in-memory databases
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
OmpSs@Zynq all-programmable SoC ecosystem
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating an application domain with specialized functional units
ACM Transactions on Architecture and Code Optimization (TACO)
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
A generalized software framework for accurate and efficient management of performance goals
Proceedings of the Eleventh ACM International Conference on Embedded Software
Fast and accurate power estimation method based on a PMU counter
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Analytical modeling of energy efficiency in heterogeneous processors
Computers and Electrical Engineering
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.03 |
Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.