Eliminating read barriers through procrastination and cleanliness

Authors:
KC Sivaramakrishnan;Lukasz Ziarek;Suresh Jagannathan
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 2012 international symposium on Memory Management
Year:
2012

Citing 17
Cited 1

Simple generational garbage collection and fast allocation

Software—Practice & Experience
A concurrent, generational garbage collector for a multithreaded implementation of ML

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
List processing in real time on a serial computer

Communications of the ACM
Multiprocessing compactifying garbage collection

Communications of the ACM
Thread-specific heaps for multi-threaded programs

Proceedings of the 2nd international symposium on Memory management
The Definition of Standard ML

The Definition of Standard ML
A real-time garbage collector with low overhead and consistent utilization

POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Trading data space for reduced time and code space in real-time garbage collection on stock hardware

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Barriers: friend or foe?

Proceedings of the 4th international symposium on Memory management
A Fast Analysis for Thread-Local Garbage Collection with Dynamic Class Loading

SCAM '05 Proceedings of the Fifth IEEE International Workshop on Source Code Analysis and Manipulation
Limits of parallel marking garbage collection

Proceedings of the 7th international symposium on Memory management
Concurrent Programming in ML

Concurrent Programming in ML
Optimizations in a private nursery-based garbage collector

Proceedings of the 2010 international symposium on Memory management
Garbage collection for multicore NUMA machines

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Multicore garbage collection with local heaps

Proceedings of the international symposium on Memory management
Composable asynchronous events

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Assessing the scalability of garbage collectors on many cores

ACM SIGOPS Operating Systems Review

A study of the scalability of stop-the-world garbage collectors on multicores

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Managed languages typically use read barriers to interpret forwarding pointers introduced to keep track of copied objects. For example, in a multicore environment with thread-local heaps and a global, shared heap, an object initially allocated on a local heap may be copied to a shared heap if it becomes the source of a store operation whose target location resides on the shared heap. As part of the copy operation, a forwarding pointer may be established in the original object to point to the copied object. This level of indirection avoids the need to update all of the references to the object that has been copied. In this paper, we consider the design of a managed runtime that eliminates read barriers. Our design is premised on the availability of a sufficient degree of concurrency to stall operations that would otherwise necessitate the copy. Stalled actions are deferred until the next local collection, avoiding exposing forwarding pointers to the mutator. In certain important cases, procrastination is unnecessary -- lightweight runtime techniques can sometimes be used to allow objects to be eagerly copied when their set of incoming references is known, or when it can be determined that having multiple copies would not violate program semantics. We evaluate our techniques on 3 platforms: a 16-core AMD64 machine, a 48-core Intel SCC, and an 864-core Azul Vega 3. Experimental results over a range of parallel benchmarks indicate that our approach leads to notable performance gains (20 - 32% on average) without incurring any additional complexity.