Integrating software caches with scratch pad memory

Authors:
Prasenjit Chakraborty;Preeti Ranjan Panda
Affiliations:
Indian Institute of Technology Delhi, New Delhi, India;Indian Institute of Technology Delhi, New Delhi, India
Venue:
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2012

Citing 26
Cited 1

Software caching and computation migration in Olden

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
FlexCache: A Framework for Flexible Compiler Generated Data Caching

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Let's Study Whole-Program Cache Behaviour Analytically

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Memory Coloring: A Compiler Approach for Scratchpad Memory Management

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
WCET Centric Data Allocation to Scratchpad Memory

RTSS '05 Proceedings of the 26th IEEE International Real-Time Systems Symposium
Hardware/software managed scratchpad memory for embedded system

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Data partitioning for maximal scratchpad usage

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
An integrated scratch-pad allocator for affine and non-affine code

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic scratch-pad memory management for irregular array access patterns

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Prefetching irregular references for software cache on cell

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Orchestrating data transfer for the cell/B.E. processor

Proceedings of the 22nd annual international conference on Supercomputing
Hybrid access-specific software cache techniques for the cell BE architecture

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SPM management using Markov chain based data access prediction

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
A Fast and Precise Static Loop Analysis Based on Abstract Interpretation, Program Slicing and Polytope Models

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming

Proceedings of the 48th Design Automation Conference
A reuse-aware prefetching scheme for scratchpad memory

Proceedings of the 48th Design Automation Conference
Vector class on limited local memory (LLM) multi-core processors

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems

SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in the source code. SPM data allocation strategies generally rely on compile-time inference of spatial and temporal reuse, with the general flow being the copying of a block/tile of array data into the SPM, followed by its processing, and finally, copying back. However, when array index functions are complicated due to conditionals, complex expressions, and dependence on run-time data, the SPM compiler has to rely on expensive DMA for individual words, leading to poor performance. Software caches (SWC) can play a crucial role in improving performance under such circumstances -- their access times are longer than those for direct SPM access, but they retain the advantages (present in hardware caches) of exploiting spatial and temporal locality discovered at run-time. We present the first automated compiler data allocation strategy that considers the presence of a software cache in SPM space, and makes decisions on which arrays should be accessed through it, at which times. Arrays could be accessed differently in different parts of a program, and our algorithm analyzes such uses and considers the possibility of selectively accessing an array through the SWC only when it is efficient, based on a cost model of the overheads involved in SPM/SWC transitions. We implemented our technique in an LLVM based framework and experimented with several applications on a Cell based machine. Our technique results in up to 82% overall performance improvement over a conventional SPM mapping algorithm and up to 27% over a typical SWC-enhanced implementation.