Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors

Authors:
Keith A. Bowman;Alaa R. Alameldeen;Srikanth T. Srinivasan;Chris B. Wilkerson
Affiliations:
Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2009

Citing 8
Cited 8

The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Asim: A Performance Model Framework

Computer
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Using compression to improve chip multiprocessor performance

Using compression to improve chip multiprocessor performance
Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Characterizing chip-multiprocessor variability-tolerance

Proceedings of the 45th annual Design Automation Conference
Estimation of FMAX and ISB in microprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Process-induced skew variation for scaled 2-D and 3-D ICs

Proceedings of the 12th ACM/IEEE international workshop on System level interconnect prediction
Quantifying and coping with parametric variations in 3D-stacked microarchitectures

Proceedings of the 47th Design Automation Conference
Recovery-based design for variation-tolerant SoCs

Proceedings of the 49th Annual Design Automation Conference
Effect of process variations in 3D global clock distribution networks

ACM Journal on Emerging Technologies in Computing Systems (JETC)
A fast-locking all-digital deskew buffer with duty-cycle correction

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Strong PUFs and their (physical) unpredictability: a case study with power PUFs

Proceedings of the Workshop on Embedded Systems Security
Improving platform energy: chip area trade-off in near-threshold computing environment

Proceedings of the International Conference on Computer-Aided Design
Static statistical MPSoC power optimization by variation-aware task and communication scheduling

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A statistical performance simulator is developed to explore the impact of parameter variations on the maximum clock frequency (FMAX) and throughput distributions of multicore processors in a future 22 nm technology. The simulator captures the effects of die-to-die (D2D) and within-die (WID) transistor and interconnect parameter variations on critical path delays in a die. A key component of the simulator is an analytical multicore processor throughput model, which enables computationally efficient and accurate throughput calculations, as compared with cycle-accurate performance simulators, for single-threaded and highly parallel multi-threaded (MT) workloads. Based on microarchitecture designs from previous microprocessors, three multi-core processors with either small, medium, or large cores are projected for the 22 nm technology generation to investigate a range of design options. These three multi-core processors are optimized for maximum throughput within a constant die area. A traditional single-core processor is also scaled to the 22 nm technology to provide a baseline comparison. The salient contributions from this paper are: 1) product-level variation analysis for multi-core processors must focus on throughput, rather than just FMAX, and 2) multi-core processors are more variation tolerant than single-core processors due to the larger impact of memory latency and bandwidth on throughput. To elucidate these two points, statistical simulations indicate that multi-core and single-core processors with an equivalent total core area have similar FMAX distributions (mean degradation of 9% and standard deviation of 5%) for MT applications. In contrast to single-core processors, memory latency and bandwidth constraints significantly limit the throughput dependency on FMAX in multi-core processors, thus reducing the throughput mean degradation and standard deviation by ∼50% for the small and medium core designs and by ∼30% for the large core design. This improvement in the throughput distribution indicates that multi-core processors could significantly reduce the product design and process development complexities due to parameter variations as compared to single-core processors, enabling faster time to market for high-performance microprocessor products.