A comparative performance evaluation of write barrier implementation
OOPSLA '92 conference proceedings on Object-oriented programming systems, languages, and applications
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Reconciling responsiveness with performance in pure object-oriented languages
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
A dynamic optimization framework for a Java just-in-time compiler
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Effective compile-time analysis for data prefetching in java
Effective compile-time analysis for data prefetching in java
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Profile-guided proactive garbage collection for locality optimization
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and uses the identified stride patterns to guide the compilation process by injecting the prefetch instructions at appropriate places. There are some researches trying to enable stride prefetching in runtime systems with online profiling, but they either cannot discover cross-procedural prefetch opportunity, or require special supports in hardware or garbage collection. In this paper, we present a prefetch engine for JVM (Java Virtual Machine). It firstly identifies the candidate load operations during just-in-time (JIT) compilation, and then instruments the compiled code to profile the addresses of those loads. The runtime profile is collected in a trace buffer, which triggers a prefetch controller upon a protection fault. The prefetch controller analyzes the trace to discover any stride patterns, then modifies the compiled code to inject the prefetch instructions in place of the instrumentations. One of the major advantages of this engine is that, it can detect striding loads in any virtual code places for both regular and irregular code, not being limited with plain loop or procedure scopes. Actually we found the cross-procedural patterns take about 30% of all the prefetchings in the representative Java benchmarks. Another major advantage of the engine is that it has runtime overhead much smaller (the maximal is less than 4.0%) than the benefits it brings. Our evaluation with Apache Harmony JVM shows that the engine can achieve an average 6.2% speed-up with SPECJVM98 and DaCapo on Intel Pentium 4 platform, in spite of the runtime overhead.