Cache sensitive code arrangement for virtual machine

Authors:
Chun-Chieh Lin;Chuen-Liang Chen
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Venue:
Transactions on high-performance embedded architectures and compilers III
Year:
2011

Citing 8
Cited 0

Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Optimally profiling and tracing programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
ARM System-on-Chip Architecture

ARM System-on-Chip Architecture
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
I-CoPES: fast instruction code placement for embedded systems to improve performance and energy efficiency

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
A low-cost memory architecture with NAND XIP for mobile embedded systems

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler-assisted demand paging for embedded systems with flash memory

Proceedings of the 4th ACM international conference on Embedded software

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a systematic approach to optimize the code layout of a Java ME virtual machine for an embedded system with a cache-sensitive architecture. A practice example is to run JVM directly (execution-in-place) in NAND flash memory, for which cache miss penalty is too high to endure. The refined virtual machine generated cache misses 96% less than the original version. We developed a mathematical approach helping to predict the flow of the interpreter inside the virtual machine. This approach analyzed both the static control flow graph and the pattern of bytecode instruction streams, since we found the input sequence drives the program flow of the virtual machine interpreter. Then we proposed a rule to model the execution flows of Java instructions of real applications. Furthermore, we used a graph partition algorithm as a tool to deal with the mathematical model, and this finding helped the relocation process to move program blocks to proper memory pages. The refinement approach dramatically improved the locality of the virtual machine thus reduced cache miss rates. Our technique can help Java ME-enabled devices to run faster and extend longer battery life. The approach also brings potential for designers to integrate the XIP function into System-on-Chip thanks to lower demand for cache memory.