Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Unified Compiler Framework for Control and Data Speculation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Using hardware performance monitors to understand the behavior of java applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
The java hotspotTM server compiler
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
How a Java VM can get more from a hardware performance monitor
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Mining opportunities for code improvement in a just-in-time compiler
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Identifying the sources of cache misses in Java programs without relying on hardware counters
Proceedings of the 2012 international symposium on Memory Management
Hi-index | 0.00 |
While the concept of online profile directed dynamic optimizations using hardware performance monitoring unit (PMU) data is not new, it has seen fairly limited or no use in commercial JVMs. The main reason behind this fact is the set of significant challenges involved in (1) obtaining low overhead and usable profiling support from the underlying platform (2) the complexity of filtering, interpreting and using precise PMU events online in a JVM environment (3) demonstrating the total runtime benefit of PMU data based optimizations above and beyond regular online profile based optimizations. In this paper we address all three challenges by presenting a practical framework for PMU data collection and use within a high performance product JVM on a highly scalable server platform. Our experiments with JavaTM workloads using the SunTM HotspotTM JDK 1.6 JVM on the Intel ® Itanium ® platform indicate that the hardware data collection overhead (less than 0.5%) is not as significant as the challenge of extracting the precise information for optimization purposes. We demonstrate the feasibility of mapping the instruction IP address based hardware event information to the runtime components as well as the JIT server compiler internal data structures for use in optimizations within a dynamic environment. We also evaluated the additional performance potential of optimizations such as object co-location during garbage collection and global instruction scheduling in the JIT compiler with the use of PMU generated load latency information. Experimental results show performance improvements of up to 14% with an average of 2.2% across select Java server benchmarks such as SPECjbb2005[16], SPECjvm2008[17] and Dacapo[18]. These benefits were observed over and above those provided by profile guided server JVM optimizations in the absence of hardware PMU data.