Effective prefetch for mark-sweep garbage collection

  • Authors:
  • Robin Garner;Stephen M. Blackburn;Daniel Frampton

  • Affiliations:
  • Australian National University, Canberra, Australia;Australian National University, Canberra, Australia;Australian National University, Canberra, Australia

  • Venue:
  • Proceedings of the 6th international symposium on Memory management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Garbage collection is a performance-critical feature of most modern object oriented languages, and is characterized by poor locality since it must traverse the heap. In this paperwe show that by combining two very simple ideas wecan significantly improve the performance of the canonical mark-sweep collector, resulting in improvements in application performance. We make three main contributions: 1) we develop a methodology and framework for accurately and deterministically analyzing the tracing loop at the heart ofthe collector, 2) we offer a number of insights and improvements over conventional design choices for mark-sweep collectors, and 3) we find that two simple ideas: edge order traversal and software prefetch. combine to greatly improve garbage collection performance although each is unproductive in isolation. We perform a thorough analysis in the context of MMTk and Jikes RVM on a wide range of benchmarks and four different architectures. Our baseline system (which includes a number of our improvements) is very competitive with highly tuned alternatives. We show a simple marking mechanism which offers modest but consistent improvements over conventional choices. Finally, we show that enqueuing the edges pointers) of the object graph rather than the nodes (objects) significantly increases opportunities for software prefetch, despite increasing the total number of queue operations. Combining edge ordered enqueuing with software prefetching yields average performance improvements over a large suite of benchmarks of 20-30% in garbage collection time and 4-6% of total application performance in moderate heaps, across four architectures.