Hardware/software approaches for reducing the process variation impact on instruction fetches

Authors:
Ismail Kadayif;Mahir Turkcan;Seher Kiziltepe;Ozcan Ozturk
Affiliations:
Canakkale Onsekiz Mart University, Turkey;Canakkale Onsekiz Mart University, Turkey;Canakkale Onsekiz Mart University, Turkey;Bilkent University, Turkey
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Year:
2013

Citing 29
Cited 0

Eliminating the address translation bottleneck for physical address cache

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Intrinsic MOSFET parameter fluctuations due to random dopant placement

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Generating physical addresses directly for saving instruction TLB energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
PADded Cache: A New Fault-Tolerance Technique for Cache Memories

VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
A methodology to improve timing yield in the presence of process variations

Proceedings of the 41st annual Design Automation Conference
Wire Delay is Not a Problem for SMT (In the Near Future)

Proceedings of the 31st annual international symposium on Computer architecture
Block-based Static Timing Analysis with Uncertainty

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Statistical Timing Analysis for Intra-Die Process Variations with Spatial Correlations

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
A process-tolerant cache architecture for improved yield in nanoscale technologies

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Correlation-aware statistical timing analysis with non-gaussian delay distributions

Proceedings of the 42nd annual Design Automation Conference
Full-chip analysis of leakage power under process variations, including spatial correlations

Proceedings of the 42nd annual Design Automation Conference
Modeling and Testing of SRAM for New Failure Mechanisms Due to Process Variations in Nanoscale CMOS

VTS '05 Proceedings of the 23rd IEEE Symposium on VLSI Test
A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Process and environmental variation impacts on ASIC timing

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Statistical gate sizing for timing yield optimization

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Process variation aware cache leakage management

Proceedings of the 2006 international symposium on Low power electronics and design
Yield-Aware Cache Architectures

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ReCycle:: pipeline adaptation to tolerate process variation

Proceedings of the 34th annual international symposium on Computer architecture
A Model for Timing Errors in Processors with Parameter Variation

ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
Working with process variation aware caches

Proceedings of the conference on Design, automation and test in Europe
Post silicon power/performance optimization in the presence of process variations using individual well-adaptive body biasing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Process Variation Tolerant 3T1D-Based Cache Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing Data TLB Power via Compiler-Directed Address Generation

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As technology moves towards finer process geometries, it is becoming extremely difficult to control critical physical parameters such as channel length, gate oxide thickness, and dopant ion concentration. Variations in these parameters lead to dramatic variations in access latencies in Static Random Access Memory (SRAM) devices. This means that different lines of the same cache may have different access latencies. A simple solution to this problem is to adopt the worst-case latency paradigm. While this egalitarian cache management is simple, it may introduce significant performance overhead during instruction fetches when both address translation (instruction Translation Lookaside Buffer (TLB) access) and instruction cache access take place, making this solution infeasible for future high-performance processors. In this study, we first propose some hardware and software enhancements and then, based on those, investigate several techniques to mitigate the effect of process variation on the instruction fetch pipeline stage in modern processors. For address translation, we study an approach that performs the virtual-to-physical page translation once, then stores it in a special register, reusing it as long as the execution remains on the same instruction page. To handle varying access latencies across different instruction cache lines, we annotate the cache access latency of instructions within themselves to give the circuitry a hint about how long to wait for the next instruction to become available.