Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Exploiting prolific types for memory management and optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static load classification for improving the value predictability of data-cache misses
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Static Identification of Delinquent Loads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Dynamic object sampling for pretenuring
Proceedings of the 4th international symposium on Memory management
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Profile-guided proactive garbage collection for locality optimization
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
IBM Journal of Research and Development
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
Two memory allocators that use hints to improve locality
Proceedings of the 2009 international symposium on Memory management
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
How a Java VM can get more from a hardware performance monitor
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Efficient runtime tracking of allocation sites in Java
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Layout transformations for heap objects using static access patterns
CC'07 Proceedings of the 16th international conference on Compiler construction
Hi-index | 0.00 |
Cache miss stalls are one of the major sources of performance bottlenecks for multicore processors. A Hardware Performance Monitor (HPM) in the processor is useful for locating the cache misses, but is rarely used in the real world for various reasons. It would be better to find a simple approach to locate the sources of cache misses and apply runtime optimizations without relying on an HPM. This paper shows that pointer dereferencing in hot loops is a major source of cache misses in Java programs. Based on this observation, we devised a new approach to identify the instructions and objects that cause frequent cache misses. Our heuristic technique effectively identifies the majority of the cache misses in typical Java programs by matching the hot loops to simple idiomatic code patterns. On average, our technique selected only 2.8% of the load and store instructions generated by the JIT compiler and these instructions accounted for 47% of the L1D cache misses and 49% of the L2 cache misses caused by the JIT-compiled code. To prove the effectiveness of our technique in compiler optimizations, we prototyped object placement optimizations, which align objects in cache lines or collocate paired objects in the same cache line to reduce cache misses. For comparison, we also implemented the same optimizations based on the accurate information obtained from the HPM. Our results showed that our heuristic approach was as effective as the HPM-based approach and achieved comparable performance improvements in the SPECjbb2005 and SPECpower_ssj2008 benchmark programs.