A study of the scalability of stop-the-world garbage collectors on multicores

Authors:
Lokesh Gidra;Gaël Thomas;Julien Sopena;Marc Shapiro
Affiliations:
LIP6-INRIA/UPMC, Paris, France;LIP6-INRIA/UPMC, Paris, France;LIP6-INRIA/UPMC, Paris, France;LIP6-INRIA/UPMC, Paris, France
Venue:
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Year:
2013

Citing 22
Cited 1

Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thread-specific heaps for multi-threaded programs

Proceedings of the 2nd international symposium on Memory management
Garbage-first garbage collection

Proceedings of the 4th international symposium on Memory management
NUMA-Aware Java Heaps for Server Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Stopless: a real-time garbage collector for multiprocessors

Proceedings of the 6th international symposium on Memory management
Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parallel generational-copying garbage collection with a block-structured heap

Proceedings of the 7th international symposium on Memory management
A new approach to parallelising tracing algorithms

Proceedings of the 2009 international symposium on Memory management
NUMA-aware memory manager with dominant-thread-based copying GC

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Schism: fragmentation-tolerant real-time garbage collection

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Optimizations in a private nursery-based garbage collector

Proceedings of the 2010 international symposium on Memory management
Multicore garbage collection with local heaps

Proceedings of the international symposium on Memory management
C4: the continuously concurrent compacting collector

Proceedings of the international symposium on Memory management
The Garbage Collection Handbook: The Art of Automatic Memory Management

The Garbage Collection Handbook: The Art of Automatic Memory Management
Assessing the scalability of garbage collectors on many cores

PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
Memory management for many-core processors with software configurable locality policies

Proceedings of the 2012 international symposium on Memory Management
Eliminating read barriers through procrastination and cleanliness

Proceedings of the 2012 international symposium on Memory Management
The Collie: a wait-free compacting collector

Proceedings of the 2012 international symposium on Memory Management
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Traffic management: a holistic approach to memory placement on NUMA systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond approximately 8 cores on a 48-core with OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48~cores.