An automatic overlay generator
IBM Journal of Research and Development
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A software coherence scheme with the assistance of directories
ICS '91 Proceedings of the 5th international conference on Supercomputing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Linkers and Loaders
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Procedure placement using temporal-ordering information: dealing with code size expansion
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
A dynamic code placement technique for scratchpad memory using postpass optimization
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
The revenge of the overlay: automatic compaction of OS kernel code via on-demand code loading
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
Scratchpad memory management in a multitasking environment
EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories
HiPC'08 Proceedings of the 15th international conference on High performance computing
Hi-index | 0.00 |
The explicitly-managed memory hierarchies, where a hierarchy of distinct memories is exposed to the programmer and managed explicitly by software, are not only found in typical embedded processors but also found in a class of high performance multicore architectures. Code overlay techniques have been widely used to execute a program whose code is bigger than the available code memory in the system. To generate an efficient overlaid executable with maximum storage savings as well as minimum performance overhead, the overlay structure should be designed carefully. In this paper, we propose an efficient code overlay technique that automatically generates an overlay structure for a given memory size for multicores with explicitly-managed memory hierarchies. We observe that finding an efficient overlay structure with minimum memory copying and run-time check overhead is similar to the problem that finds a code placement with minimum conflict misses in the instruction cache. Our algorithm exploits the temporal-ordering information between functions during program execution. The information is obtained from profiling the program. Experimental results with 11 parallel applications on the Cell BE processor indicate that our approach is effective and promising.