Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thread-specific heaps for multi-threaded programs
Proceedings of the 2nd international symposium on Memory management
Garbage-first garbage collection
Proceedings of the 4th international symposium on Memory management
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Parallel garbage collection for shared memory multiprocessors
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Stopless: a real-time garbage collector for multiprocessors
Proceedings of the 6th international symposium on Memory management
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parallel generational-copying garbage collection with a block-structured heap
Proceedings of the 7th international symposium on Memory management
A new approach to parallelising tracing algorithms
Proceedings of the 2009 international symposium on Memory management
NUMA-aware memory manager with dominant-thread-based copying GC
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Schism: fragmentation-tolerant real-time garbage collection
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Optimizations in a private nursery-based garbage collector
Proceedings of the 2010 international symposium on Memory management
Multicore garbage collection with local heaps
Proceedings of the international symposium on Memory management
C4: the continuously concurrent compacting collector
Proceedings of the international symposium on Memory management
The Garbage Collection Handbook: The Art of Automatic Memory Management
The Garbage Collection Handbook: The Art of Automatic Memory Management
Assessing the scalability of garbage collectors on many cores
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
Memory management for many-core processors with software configurable locality policies
Proceedings of the 2012 international symposium on Memory Management
Eliminating read barriers through procrastination and cleanliness
Proceedings of the 2012 international symposium on Memory Management
The Collie: a wait-free compacting collector
Proceedings of the 2012 international symposium on Memory Management
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Traffic management: a holistic approach to memory placement on NUMA systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond approximately 8 cores on a 48-core with OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48~cores.