Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Practicing JUDO: Java under dynamic optimizations
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Spike: an optimizer for alpha/NT executables
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Improving instruction locality with just-in-time code layout
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Runtime adaptation: a case for reactive code alignment
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Poor code locality degrades application performance by increasing memory stalls due to instruction cache and TLB misses. This problem is particularly an issue for large server applications written in languages such as Java and C# that provide just-in-time (JIT) compilation, dynamic class loading, and dynamic recompilation. However, managed runtimes also offer an opportunity to dynamically profile applications and adapt them to improve their performance. This paper describes a Dynamic Code Management system (DCM) in a managed runtime that performs whole program code layout optimizations to improve instruction locality.We begin by implementing the widely used Pettis-Hansen algorithm for method layout to improve code locality. Unfortunately, this algorithm is too costly for a dynamic optimization system, O(n3) in time in the call graph. For example, Pettis-Hansen requires a prohibitively expensive 35 minutes to lay out MiniBean which has 15,586 methods. We propose three new code placement algorithms that target ITLB misses, which typically have the greatest impact on performance. The best of these algorithms, Code Tiling, groups methods into page sized tiles by performing a depth-first traversal of the call graph based on call frequency. Excluding overhead, experimental results show that DCM with Code Tiling improves performance by 6% on the large MiniBean benchmark over a baseline that orders methods based on invocation order, whereas Pettis-Hansen placement offers less improvement, 2%, over the same base. Furthermore, Code Tiling lays out MiniBean in just 0.35 seconds for 15,586 methods (6000 times faster than Pettis-Hansen) which makes it suitable for high-performance managed runtimes.