Process variation aware thread mapping for chip multiprocessors

Authors:
S. Hong;S. H. K. Narayanan;M. Kandemir;Ö. Özturk
Affiliations:
The Pennsylvania State University;The Pennsylvania State University;The Pennsylvania State University;Bilkent University
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2009

Citing 13
Cited 8

Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
A Single-Chip Multiprocessor

Computer
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
A process-tolerant cache architecture for improved yield in nanoscale technologies

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Mapping statistical process variations toward circuit performance variability: an analytical modeling approach

Proceedings of the 42nd annual Design Automation Conference
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
The impact of device parameter variations on the frequency and performance of VLSI chips

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Process variation aware cache leakage management

Proceedings of the 2006 international symposium on Low power electronics and design
Power efficiency for variation-tolerant multicore processors

Proceedings of the 2006 international symposium on Low power electronics and design
Performance and yield enhancement of FPGAs with within-die variation using multiple configurations

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems

Proceedings of the International Symposium on Code Generation and Optimization
Comparative analysis of conventional and statistical design techniques

Proceedings of the 44th annual Design Automation Conference
Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design

OpenMP Support for NBTI-Induced Aging Tolerance in MPSoCs

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Characterizing the impact of process variation on 45 nm NoC-based CMPs

Journal of Parallel and Distributed Computing
Process variation-aware routing in NoC based multicores

Proceedings of the 48th Design Automation Conference
Variability-tolerant workload allocation for MPSoC energy minimization under real-time constraints

ACM Transactions on Embedded Computing Systems (TECS)
Dynamic thread mapping based on machine learning for transactional memory applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
HW-SW integration for energy-efficient/variability-aware computing

Proceedings of the Conference on Design, Automation and Test in Europe
Mapping on multi/many-core systems: survey of current and emerging trends

Proceedings of the 50th Annual Design Automation Conference
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions. With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions.