An effective garbage collection strategy for parallel programming languages on large scale distributed-memory machines

  • Authors:
  • Kenjiro Taura;Akinori Yonezawa

  • Affiliations:
  • Department of Information Science, Faculty of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan;Department of Information Science, Faculty of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan

  • Venue:
  • PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the design and implementation of a garbage collection scheme on large-scale distributed-memory computers and reports various experimental results. The collector is based on the conservative GC library by Boehm & Weiser. Each processor traces local pointers using the GC library while traversing remote pointers by exchanging "mark messages" between processors. It exhibits a promising performance---in the most space-intensive settings we tested, the total collection overhead ranges from 5% up to 15% of the application running time (excluding idle time). We not only examine basic performance figures such as the total overhead or latency of a global collection, but also demonstrate how local collection scheduling strategies affect application performance. In our collector, a local collection is scheduled either independently or synchronously. Experimental results show that the benefit of independent local collections has been overstated in the literature. Independent local collections slowed down application performance to 40%, by increasing the average communication latency. Synchronized local collections exhibit much more robust performance characteristics than independent local collections and the overhead for global synchronization is not significant. Furthermore, we show that an adaptive collection scheduler can select the appropriate local collection strategy based on the application's behavior. The collector has been used in a concurrent object-oriented language ABCL/f and the performance is measured on a large-scale parallel computer (256 processors) using four non-trivial applications written in ABCL/f.