Effective “static-graph” reorganization to improve locality in garbage-collected systems
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Garbage collection: algorithms for automatic dynamic memory management
Garbage collection: algorithms for automatic dynamic memory management
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Cycles to recycle: garbage collection to the IA-64
Proceedings of the 2nd international symposium on Memory management
A LISP garbage-collector for virtual-memory computer systems
Communications of the ACM
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Object Type Directed Garbage Collection To Improve Locality
IWMM '92 Proceedings of the International Workshop on Memory Management
Data Flow Analysis for Software Prefetching Linked Data Structures in Java
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Address/memory management for a gigantic LISP environment or, GC considered harmful
LFP '80 Proceedings of the 1980 ACM conference on LISP and functional programming
Detecting global stride locality in value streams
Proceedings of the 30th annual international symposium on Computer architecture
Quantifying Load Stream Behavior
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Connectivity-based garbage collection
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Improving 64-Bit Java IPF Performance by Compressing Heap References
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Java server performance: a case study of building efficient, scalable Jvms
IBM Systems Journal
Concurrency and Computation: Practice & Experience - 2002 ACM Java Grande—ISCOPE Conference Part I
Garbage collection without paging
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Nonintrusive precision instrumentation of microcontroller software
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
System level perspective on object locality
OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Using Scratchpad to Exploit Object Locality in Java
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Profile-guided proactive garbage collection for locality optimization
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
IEEE Transactions on Computers
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using hpm-sampling to drive dynamic compilation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Online Phase-Adaptive Data Layout Selection
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Runtime engine for dynamic profile guided stride prefetching
Journal of Computer Science and Technology
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
How a Java VM can get more from a hardware performance monitor
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Novel online profiling for virtual machines
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Loaf: a framework and infrastructure for creating online adaptive solutions
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Using platform-specific performance counters for dynamic compilation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
XAMM: a high-performance automatic memory management system with memory-constrained designs
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Identifying the sources of cache misses in Java programs without relying on hardware counters
Proceedings of the 2012 international symposium on Memory Management
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Simple profile rectifications go a long way
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Hi-index | 0.00 |
Cache miss stalls hurt performance because of the large gap between memory and processor speeds - for example, the popular server benchmark SPEC JBB2000 spends 45% of its cycles stalled waiting for memory requests on the Itanium® 2 processor. Traversing linked data structures causes a large portion of these stalls. Prefetching for linked data structures remains a major challenge because serial data dependencies between elements in a linked data structure preclude the timely materialization of prefetch addresses. This paper presents Mississippi Delta (MS Delta), a novel technique for prefetching linked data structures that closely integrates the hardware performance monitor (HPM), the garbage collector's global view of heap and object layout, the type-level metadata inherent in type-safe programs, and JIT compiler analysis. The garbage collector uses the HPM's data cache miss information to identify cache miss intensive traversal paths through linked data structures, and then discovers regular distances (deltas) between these linked objects. JIT compiler analysis injects prefetch instructions using deltas to materialize prefetch addresses.We have implemented MS Delta in a fully dynamic profile-guided optimization system: the StarJIT dynamic compiler [1] and the ORP Java virtual machine [9]. We demonstrate a 28-29% reduction in stall cycles attributable to the high-latency cache misses targeted by MS Delta and a speedup of 11-14% on the cache miss intensive SPEC JBB2000 benchmark.