Effective prefetch for mark-sweep garbage collection

Authors:
Robin Garner;Stephen M. Blackburn;Daniel Frampton
Affiliations:
Australian National University, Canberra, Australia;Australian National University, Canberra, Australia;Australian National University, Canberra, Australia
Venue:
Proceedings of the 6th international symposium on Memory management
Year:
2007

Citing 12
Cited 6

Effective “static-graph” reorganization to improve locality in garbage-collected systems

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Garbage collection: algorithms for automatic dynamic memory management

Garbage collection: algorithms for automatic dynamic memory management
The measured cost of copying garbage collection mechanisms

ICFP '97 Proceedings of the second ACM SIGPLAN international conference on Functional programming
Adaptive optimization in the Jalapeño JVM

OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Reducing garbage collector cache misses

Proceedings of the 2nd international symposium on Memory management
Recursive functions of symbolic expressions and their computation by machine, Part I

Communications of the ACM
Dynamic Storage Allocation: A Survey and Critical Review

IWMM '95 Proceedings of the International Workshop on Memory Management
Myths and realities: the performance impact of garbage collection

Proceedings of the joint international conference on Measurement and modeling of computer systems
The Jalapeño virtual machine

IBM Systems Journal
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The garbage collection advantage: improving program locality

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications

Cell GC: using the cell synergistic processor as a garbage collection coprocessor

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Demystifying magic: high-level low-level programming

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The locality of concurrent write barriers

Proceedings of the 2010 international symposium on Memory management
A comprehensive evaluation of object scanning techniques

Proceedings of the international symposium on Memory management
Scalable concurrent and parallel mark

Proceedings of the 2012 international symposium on Memory Management
Efficient context sensitivity for dynamic analyses via calling context uptrees and customized memory management

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Garbage collection is a performance-critical feature of most modern object oriented languages, and is characterized by poor locality since it must traverse the heap. In this paperwe show that by combining two very simple ideas wecan significantly improve the performance of the canonical mark-sweep collector, resulting in improvements in application performance. We make three main contributions: 1) we develop a methodology and framework for accurately and deterministically analyzing the tracing loop at the heart ofthe collector, 2) we offer a number of insights and improvements over conventional design choices for mark-sweep collectors, and 3) we find that two simple ideas: edge order traversal and software prefetch. combine to greatly improve garbage collection performance although each is unproductive in isolation. We perform a thorough analysis in the context of MMTk and Jikes RVM on a wide range of benchmarks and four different architectures. Our baseline system (which includes a number of our improvements) is very competitive with highly tuned alternatives. We show a simple marking mechanism which offers modest but consistent improvements over conventional choices. Finally, we show that enqueuing the edges pointers) of the object graph rather than the nodes (objects) significantly increases opportunities for software prefetch, despite increasing the total number of queue operations. Combining edge ordered enqueuing with software prefetching yields average performance improvements over a large suite of benchmarks of 20-30% in garbage collection time and 4-6% of total application performance in moderate heaps, across four architectures.