Parallel memory defragmentation on a GPU

Authors:
Ronald Veldema;Michael Philippsen
Affiliations:
University of Erlangen-Nuremberg, Erlangen, Germany;University of Erlangen-Nuremberg, Erlangen, Germany
Venue:
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Year:
2012

Citing 19
Cited 0

Eliminating external fragmentation in a non-moving garbage collector for Java

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Comparison of Compacting Algorithms for Garbage Collection

ACM Transactions on Programming Languages and Systems (TOPLAS)
Sapphire: copying GC without stopping the world

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A parallel java grande benchmark suite

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Incremental Collection of Mature Objects

IWMM '92 Proceedings of the International Workshop on Memory Management
Non-compacting memory allocation and real-time garbage collection

Non-compacting memory allocation and real-time garbage collection
An on-the-fly mark and sweep garbage collector based on sliding views

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
An efficient parallel heap compaction algorithm

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Mostly concurrent compaction for mark-sweep GC

Proceedings of the 4th international symposium on Memory management
Garbage collection without paging

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Improving locality with parallel hierarchical copying GC

Proceedings of the 5th international symposium on Memory management
The Compressor: concurrent, incremental, and parallel compaction

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Parallel generational-copying garbage collection with a block-structured heap

Proceedings of the 7th international symposium on Memory management
NUMA-aware memory manager with dominant-thread-based copying GC

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
A comparative evaluation of parallel garbage collector implementations

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Iterative data-parallel mark&sweep on a GPU

Proceedings of the international symposium on Memory management
The Garbage Collection Handbook: The Art of Automatic Memory Management

The Garbage Collection Handbook: The Art of Automatic Memory Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput memory management techniques such as malloc/free or mark-and-sweep collectors often exhibit memory fragmentation leaving allocated objects interspersed with free memory holes. Memory defragmentation removes such holes by moving objects around in memory so that they become adjacent (compaction) and holes can be merged (coalesced) to form larger holes. However, known defragmentation techniques are slow. This paper presents a parallel solution to best-effort partial defragmentation that makes use of all available cores. The solution not only speeds up defragmentation times significantly, but it also scales for many simple cores. It can therefore even be implemented on a GPU. One problem with compaction is that it requires all references to moved objects to be retargeted to point to their new locations. This paper further improves existing work by a better identification of the parts of the heap that contain references to objects moved by the compactor and only processes these parts to find the references that are then retargeted in parallel. To demonstrate the performance of the new memory defragmentation algorithm on many-core processors, we show its performance on a modern GPU. Parallelization speeds up compaction 40 times and coalescing up to 32 times. After compaction, our algorithm only needs to process 2%--4% of the total heap to retarget references.