MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Garbage collection in an uncooperative environment
Software—Practice & Experience
Garbage collection in MultiScheme
Proceedings of the US/Japan workshop on Parallel Lisp on Parallel Lisp: languages and systems
Space efficient conservative garbage collection
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A concurrent copying garbage collector for languages that distinguish (im)mutable data
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A concurrent, generational garbage collector for a multithreaded implementation of ML
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A methodology for implementing highly concurrent data objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Concurrent replicating garbage collection
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Remarks on A methodology for implementing highly concurrent data
ACM SIGPLAN Notices
Notes on “A methodology for implementing highly concurrent data objects”
ACM Transactions on Programming Languages and Systems (TOPLAS)
Garbage collection: algorithms for automatic dynamic memory management
Garbage collection: algorithms for automatic dynamic memory management
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Lock-Free Garbage Collection for Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Evaluation of Parallel Copying Garbage Collection on a Shared-Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Uniprocessor Garbage Collection Techniques
IWMM '92 Proceedings of the International Workshop on Memory Management
ICC++-AC++ Dialect for High Performance Parallel Computing
ISOTAS '96 Proceedings of the Second JSSST International Symposium on Object Technologies for Advanced Software
On bounding time and space for multiprocessor garbage collection
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An on-the-fly reference counting garbage collector for Java
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Reducing pause time of conservative collectors
Proceedings of the 3rd international symposium on Memory management
A parallel, incremental and concurrent GC for servers
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On bounding time and space for multiprocessor garbage collection
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Garbage-first garbage collection
Proceedings of the 4th international symposium on Memory management
A parallel, incremental, mostly concurrent garbage collector for servers
ACM Transactions on Programming Languages and Systems (TOPLAS)
An on-the-fly reference-counting garbage collector for java
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Lock-free parallel and concurrent garbage collection by mark&sweep
Science of Computer Programming
An efficient on-the-fly cycle collection
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel generational-copying garbage collection with a block-structured heap
Proceedings of the 7th international symposium on Memory management
Space-and-time efficient garbage collectors for parallel systems
Proceedings of the 6th ACM conference on Computing frontiers
A new approach to parallelising tracing algorithms
Proceedings of the 2009 international symposium on Memory management
A comparative evaluation of parallel garbage collector implementations
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Tracing garbage collection on highly parallel platforms
Proceedings of the 2010 international symposium on Memory management
Iterative data-parallel mark&sweep on a GPU
Proceedings of the international symposium on Memory management
Lock-Free parallel garbage collection
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Achieving middleware execution efficiency: hardware-assisted garbage collection operations
The Journal of Supercomputing
Age-Oriented concurrent garbage collection
CC'05 Proceedings of the 14th international conference on Compiler Construction
An efficient on-the-fly cycle collection
CC'05 Proceedings of the 14th international conference on Compiler Construction
A localized tracing scheme applied to garbage collection
APLAS'06 Proceedings of the 4th Asian conference on Programming Languages and Systems
Memory management for many-core processors with software configurable locality policies
Proceedings of the 2012 international symposium on Memory Management
A study of data structures with a deep heap shape
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Hi-index | 0.00 |
This work describes implementation of a mark-sweep garbage collector (GC) for shared-memory machines and reports its performance. It is a simple ''parallel'' collector in which all processors cooperatively traverse objects in the global shared heap. The collector stops the application program during a collection and assumes a uniform access cost to all locations in the shared heap. Implementation is based on the Boehm-Demers-Weiser conservative GC (Boehm GC). Experiments have been done on Ultra Enterprise 10000 (Ultra Sparc processor 250 MHz, 64 processors). We wrote two applications, BH (an N-body problem solver) and CKY (a context free grammar parser) in a parallel extension to C++.Through the experiments, We observe that load balancing is the key to achieving scalability. A naive collector without load redistribution hardly exhibits speed-up (at most fourfold speed-up on 64 processors). Performance can be improved by dynamic load balancing, which exchanges objects to be scanned between processors, but we still observe that straightforward implementation severely limits performance. First, large objects become a source of significant load imbalance, because the unit of load redistribution is a single object. Performance is improved by splitting a large object into small pieces before pushing it onto the mark stack. Next, processors spend a significant amount of time uselessly because of serializing method for termination detection using a shared counter. This problem suddenly appeared on more than 32 processors. By implementing non-serializing method for termination detection, the idle time is eliminated and performance is improved. With all these careful implementation, we achieved average speed-up of 28.0 in BH and 28.6 in CKY on 64 processors.