Scalable concurrent and parallel mark

Authors:
Balaji Iyengar;Edward Gehringer;Michael Wolf;Karthikeyan Manivannan
Affiliations:
Azul Systems Inc, Sunnyvale, CA, USA;North Carolina State University, Raleigh, NC, USA;Azul Systems Inc., Sunnyvale, CA, USA;Azul Systems Inc., Sunnyvale, CA, USA
Venue:
Proceedings of the 2012 international symposium on Memory Management
Year:
2012

Citing 24
Cited 0

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A generational on-the-fly garbage collector for Java

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Reducing garbage collector cache misses

Proceedings of the 2nd international symposium on Memory management
Reducing pause time of conservative collectors

Proceedings of the 3rd international symposium on Memory management
A parallel, incremental and concurrent GC for servers

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On-the-fly garbage collection: an exercise in cooperation

Language Hierarchies and Interfaces, International Summer School
An on-the-fly mark and sweep garbage collector based on sliding views

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
A Generational Mostly-concurrent Garbage Collector

A Generational Mostly-concurrent Garbage Collector
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Garbage-first garbage collection

Proceedings of the 4th international symposium on Memory management
NUMA-Aware Java Heaps for Server Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The pauseless GC algorithm

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
A parallel, incremental, mostly concurrent garbage collector for servers

ACM Transactions on Programming Languages and Systems (TOPLAS)
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Effective prefetch for mark-sweep garbage collection

Proceedings of the 6th international symposium on Memory management
Limits of parallel marking garbage collection

Proceedings of the 7th international symposium on Memory management
A study of Java object demographics

Proceedings of the 7th international symposium on Memory management
NUMA-aware memory manager with dominant-thread-based copying GC

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Concurrent, parallel, real-time garbage-collection

Proceedings of the 2010 international symposium on Memory management
C4: the continuously concurrent compacting collector

Proceedings of the international symposium on Memory management
The Garbage Collection Handbook: The Art of Automatic Memory Management

The Garbage Collection Handbook: The Art of Automatic Memory Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms. We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance. In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.