Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems

Authors:
O. Ozturk;G. Chen;M. Kandemir;M. Karakoy
Affiliations:
Pennsylvania State University;Pennsylvania State University;Pennsylvania State University;Imperial College, UK
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2007

Citing 27
Cited 2

SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Compiling for numa parallel machines

Compiling for numa parallel machines
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Area efficient architectures for information integrity in cache memories

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fault-Containment in Cache Memories for TMR Redundant Processor Systems

IEEE Transactions on Computers
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Performance, energy, and reliability tradeoffs in replicating hot cache lines

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Post Silicon Power/Performance Optimization in the Presence of ProcessVariations Using Individual Well Adaptive Body Biasing (IWABB)

ISQED '04 Proceedings of the 5th International Symposium on Quality Electronic Design
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
Enhancing data cache reliability by the addition of a small fully-associative replication cache

Proceedings of the 18th annual international conference on Supercomputing
Full-chip analysis of leakage power under process variations, including spatial correlations

Proceedings of the 42nd annual Design Automation Conference
Modeling and Testing of SRAM for New Failure Mechanisms Due to Process Variations in Nanoscale CMOS

VTS '05 Proceedings of the 23rd IEEE Symposium on VLSI Test
A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Process and environmental variation impacts on ASIC timing

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Variability in sub-100nm SRAM designs

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging

Adaptive scratch pad memory management for dynamic behavior of multimedia applications

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Process variation aware thread mapping for chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a result of process parameter variations, a large variability in circuit delay occurs in scaled technologies. This delay or latency variation problem is particularly pressing for memory components due to the minimum sized transistors used to build them. Current memory design techniques mostly cope with such variations by adopting a worst-case design option, which simply assumes all memory locations are operated under the worst possible latency, whereas in reality some memory locations could be much faster than the others. Note that, assuming any other latency value other than the worst-case latency for all memory locations uniformly can lead to reliability problems, since the data may not be ready when the assumed latency has passed. Instead of operating under the worst-case design option, this paper proposes and experimentally evaluates a compilerdriven approach that operates an on-chip scratch-pad memory (SPM) assuming different latencies for the different SPM lines. Our goal is to reduce execution cycles without creating any reliability problems due to variations in access latencies. The proposed scheme achieves its goal by evaluating the reuse of different data items and adopting a reuse and latency aware data-to-SPM placement. It also employs data migration within SPM when it helps to cut down the number of execution cycles further. We also discuss an alternate scheme that can reduce latency of select SPM locations by controlling a circuit level mechanism in software to further improve performance. We implemented our approach within an optimizing compiler and tested its effectiveness through extensive simulations. Our experiments with twelve embedded application codes show that the proposed approach performs much better than the worst-case based design paradigm (16.2% improvement on the average) and comes close (within 5.7%) to an hypothetical bestcase design (i.e., one with no process variation) where every SPM locations uniformly have low latency.