SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Compiling for numa parallel machines
Compiling for numa parallel machines
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Area efficient architectures for information integrity in cache memories
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fault-Containment in Cache Memories for TMR Redundant Processor Systems
IEEE Transactions on Computers
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Performance, energy, and reliability tradeoffs in replicating hot cache lines
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
ISQED '04 Proceedings of the 5th International Symposium on Quality Electronic Design
An integrated hardware/software approach for run-time scratchpad management
Proceedings of the 41st annual Design Automation Conference
Enhancing data cache reliability by the addition of a small fully-associative replication cache
Proceedings of the 18th annual international conference on Supercomputing
Full-chip analysis of leakage power under process variations, including spatial correlations
Proceedings of the 42nd annual Design Automation Conference
Modeling and Testing of SRAM for New Failure Mechanisms Due to Process Variations in Nanoscale CMOS
VTS '05 Proceedings of the 23rd IEEE Symposium on VLSI Test
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Process and environmental variation impacts on ASIC timing
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Variability in sub-100nm SRAM designs
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Adaptive scratch pad memory management for dynamic behavior of multimedia applications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Process variation aware thread mapping for chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
As a result of process parameter variations, a large variability in circuit delay occurs in scaled technologies. This delay or latency variation problem is particularly pressing for memory components due to the minimum sized transistors used to build them. Current memory design techniques mostly cope with such variations by adopting a worst-case design option, which simply assumes all memory locations are operated under the worst possible latency, whereas in reality some memory locations could be much faster than the others. Note that, assuming any other latency value other than the worst-case latency for all memory locations uniformly can lead to reliability problems, since the data may not be ready when the assumed latency has passed. Instead of operating under the worst-case design option, this paper proposes and experimentally evaluates a compilerdriven approach that operates an on-chip scratch-pad memory (SPM) assuming different latencies for the different SPM lines. Our goal is to reduce execution cycles without creating any reliability problems due to variations in access latencies. The proposed scheme achieves its goal by evaluating the reuse of different data items and adopting a reuse and latency aware data-to-SPM placement. It also employs data migration within SPM when it helps to cut down the number of execution cycles further. We also discuss an alternate scheme that can reduce latency of select SPM locations by controlling a circuit level mechanism in software to further improve performance. We implemented our approach within an optimizing compiler and tested its effectiveness through extensive simulations. Our experiments with twelve embedded application codes show that the proposed approach performs much better than the worst-case based design paradigm (16.2% improvement on the average) and comes close (within 5.7%) to an hypothetical bestcase design (i.e., one with no process variation) where every SPM locations uniformly have low latency.