MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A generational on-the-fly garbage collector for Java
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Reducing garbage collector cache misses
Proceedings of the 2nd international symposium on Memory management
Reducing pause time of conservative collectors
Proceedings of the 3rd international symposium on Memory management
A parallel, incremental and concurrent GC for servers
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On-the-fly garbage collection: an exercise in cooperation
Language Hierarchies and Interfaces, International Summer School
An on-the-fly mark and sweep garbage collector based on sliding views
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
A Generational Mostly-concurrent Garbage Collector
A Generational Mostly-concurrent Garbage Collector
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Garbage-first garbage collection
Proceedings of the 4th international symposium on Memory management
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
A parallel, incremental, mostly concurrent garbage collector for servers
ACM Transactions on Programming Languages and Systems (TOPLAS)
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Parallel garbage collection for shared memory multiprocessors
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Effective prefetch for mark-sweep garbage collection
Proceedings of the 6th international symposium on Memory management
Limits of parallel marking garbage collection
Proceedings of the 7th international symposium on Memory management
A study of Java object demographics
Proceedings of the 7th international symposium on Memory management
NUMA-aware memory manager with dominant-thread-based copying GC
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Concurrent, parallel, real-time garbage-collection
Proceedings of the 2010 international symposium on Memory management
C4: the continuously concurrent compacting collector
Proceedings of the international symposium on Memory management
The Garbage Collection Handbook: The Art of Automatic Memory Management
The Garbage Collection Handbook: The Art of Automatic Memory Management
Hi-index | 0.00 |
Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms. We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance. In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.