Scalable real-time parallel garbage collection for symmetric multiprocessors

  • Authors:
  • Guy Blelloch;Robert Harper;Perry Cheng

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Scalable real-time parallel garbage collection for symmetric multiprocessors
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Garbage collection or automatic memory reclamation frees the programmer from the tedious and often error-prone task of memory deallocation. Since its incorporation into LISP in 1970, it has proved popular among high-level and safe programming languages such as Perl, ML, and Java. Due to the disruption of collection on the application, past efforts have concentrated on bounding application pause times by collecting incrementally or by using a multiprocessor. However, these sometimes successful collectors are not suitable for providing both scalable parallelism and real-time response at the same time. This thesis describes the first parallel, real-time garbage collection algorithm. Using an abstract symmetric multiprocessor model, we prove space and time bounds guaranteeing that every application pause is constant and that the collector takes bounded space. The possibility of multiple collector and application threads executing simultaneously requires careful synchronization to preserve correctness. To achieve real-time bounds, large objects must be copied incrementally and particular attention must be paid to turning the collector on and off. Finally, load-balancing using a scalable stack or queue provides good parallelism. In order to make the algorithm practical for incorporation into a runtime system, two types of changes are necessary. First, most compilers require a richer interface including stacks and global variables. Also, we must eliminate certain constructs in the abstract model that have no obvious translation. The difficulty in these modifications lies in preserving the other properties of the collector. Second, algorithmic changes for improving performance were adopted. These modifications reduce collection overhead and memory consumption. The resulting algorithm has been implemented in the context of the TILT/SML compiler. Using a set of 15 benchmarks, the collector was tested on an Enterprise 10K. In the 5 ms range, the collector is able to always provide the application with 15% to 35% access to the processor even when the collector is on. As for scalability, the collector gives a speedup of 24 to 29 when running with 32 processors. The collector's speedup is, in all cases, greater than that of the application.