Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
WCET Centric Data Allocation to Scratchpad Memory
RTSS '05 Proceedings of the 26th IEEE International Real-Time Systems Symposium
Hardware/software managed scratchpad memory for embedded system
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
An integrated scratch-pad allocator for affine and non-affine code
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic scratch-pad memory management for irregular array access patterns
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Scratchpad allocation for concurrent embedded software
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
SPM management using Markov chain based data access prediction
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
A software solution for dynamic stack management on scratch pad memory
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Trace-based Performance Analysis on Cell BE
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Improving scratchpad allocation with demand-driven data tiling
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Optimal WCET-aware code selection for scratchpad memory
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
Proceedings of the 48th Design Automation Conference
Vector class on limited local memory (LLM) multi-core processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Integrating software caches with scratch pad memory
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Runnemede: An architecture for Ubiquitous High-Performance Computing
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Hi-index | 0.00 |
Modern system architectures sometimes include scratch pad memories (SPM) in their memory hierarchy to take advantage of their simpler design, in an attempt to meet the system area, performance, and power budget. These systems employing SPM can be broadly categorized as: (a) cacheless systems with only SPM, (b) hybrid systems with both cache and SPM, and (c) reconfigurable systems with the provision to reconfigure local memory as either cache, SPM, or a combination of the two. However SPM based systems have needed larger efforts spent on their programming, mainly due to allocating data and orchestrating data transfers explicitly by software. Tight product development cycles require faster development and porting of diverse applications to multiple SPM based architectures. In this paper we present SPM-Sieve, a profile-based tool and framework targeted for SPM based architectures that generates partitioning decisions of the first level memory in the system hierarchy, and suggests object mapping amongst the memory partitions without resorting to detailed simulation of all configurations. This is done by natively executing an application and using minimal target architecture specification, which not only provides early information influencing data organization in the application, but also provides a foundation for other more sophisticated algorithms to produce optimized allocations. We demonstrate the utility and generality of SPM-Sieve by evaluating it on a large number of SPEC2000 benchmarks targeted for a 128KB first level memory. We evaluate its effectiveness by performing simulation studies comparing the partition suggested by the tool against varying partition sizes, and observe that its suggestions are very competitive for SPM based architectures with and without caches.