VCODE: a retargetable, extensible, very fast dynamic code generation system
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Garbage collection: algorithms for automatic dynamic memory management
Garbage collection: algorithms for automatic dynamic memory management
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
A proposal to establish a pseudo virtual memory via writable overlays
Communications of the ACM
Operating Systems: Program overlay techniques
Communications of the ACM
Communications of the ACM
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
Linkers and Loaders
DELI: a new run-time control point
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Retargetable and reconfigurable software dynamic translation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An infrastructure for adaptive dynamic optimization
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Adaptive code unloading for resource-constrained JVMs
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
Proceedings of the international symposium on Code generation and optimization
BB-GC: Basic-Block Level Garbage Collection
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Operating Systems Design and Implementation (3rd Edition)
Operating Systems Design and Implementation (3rd Edition)
Managing bounded code caches in dynamic binary optimization systems
ACM Transactions on Architecture and Code Optimization (TACO)
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Heap data allocation to scratch-pad memory in embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Dynamic data scratchpad memory management for a memory subsystem with an MMU
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
The java hotspotTM server compiler
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Thrashing: its causes and prevention
AFIPS '68 (Fall, part I) Proceedings of the December 9-11, 1968, fall joint computer conference, part I
International workshop on multicore software engineering (IWMSE 2009)
ICSE '09 COMPANION Proceedings of the 2009 31st International Conference on Software Engineering: Companion Volume
Reducing memory space consumption through dataflow analysis
Computer Languages, Systems and Structures
Hi-index | 0.00 |
Multicore designers often add a small local memory close to each core to speed up access and to reduce off-chip IO. But this approach puts a burden on the programmer, the compiler, and the runtime system, since this memory lacks hardware support (cache logic, MMU, …) and hence needs to be managed in software to exploit its performance potential. The IBM Cell Broadband Engine (Cell B. E.) is extreme in this respect, since each of the parallel cores can only address code and data in its own local memory directly. Overlay techniques from the 70ies solve this problem with the well-known drawbacks: The programmer must manually divide the program into overlays and the largest overlay determines how much data the application can work with. In our approach, programmers do no longer need to cut overlays. Instead, we automatically and at runtime fragment and load small code snippets into a code cache located in the local stores and supervised by a garbage collector. Since our loader does not load code that is not needed for execution, the code cache can be much smaller (up to 70%) than the original program size. Applications can therefore work on larger data sets, i. e., bigger problems. Our loader is highly efficient and slows down applications by less than 5% on average. It can load any native code without pre-processing or changes in the software tool chain.