Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Computer
Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
A process-tolerant cache architecture for improved yield in nanoscale technologies
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 42nd annual Design Automation Conference
The impact of device parameter variations on the frequency and performance of VLSI chips
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Process variation aware cache leakage management
Proceedings of the 2006 international symposium on Low power electronics and design
Power efficiency for variation-tolerant multicore processors
Proceedings of the 2006 international symposium on Low power electronics and design
Performance and yield enhancement of FPGAs with within-die variation using multiple configurations
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems
Proceedings of the International Symposium on Code Generation and Optimization
Comparative analysis of conventional and statistical design techniques
Proceedings of the 44th annual Design Automation Conference
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
OpenMP Support for NBTI-Induced Aging Tolerance in MPSoCs
SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Characterizing the impact of process variation on 45 nm NoC-based CMPs
Journal of Parallel and Distributed Computing
Process variation-aware routing in NoC based multicores
Proceedings of the 48th Design Automation Conference
Variability-tolerant workload allocation for MPSoC energy minimization under real-time constraints
ACM Transactions on Embedded Computing Systems (TECS)
Dynamic thread mapping based on machine learning for transactional memory applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
HW-SW integration for energy-efficient/variability-aware computing
Proceedings of the Conference on Design, Automation and Test in Europe
Mapping on multi/many-core systems: survey of current and emerging trends
Proceedings of the 50th Annual Design Automation Conference
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
International Journal of Parallel Programming
Hi-index | 0.00 |
With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions. With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions.