A parallel, incremental and concurrent GC for servers

  • Authors:
  • Yoav Ossia;Ori Ben-Yitzhak;Irit Goft;Elliot K. Kolodner;Victor Leikehman;Avi Owshanko

  • Affiliations:
  • IBM Haifa Research Laboratory, Haifa 31905, ISRAEL;IBM Haifa Research Laboratory, Haifa 31905, ISRAEL;IBM Haifa Research Laboratory, Haifa 31905, ISRAEL;IBM Haifa Research Laboratory, Haifa 31905, ISRAEL;IBM Haifa Research Laboratory, Haifa 31905, ISRAEL;IBM Haifa Research Laboratory, Haifa 31905, ISRAEL

  • Venue:
  • PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multithreaded applications with multi-gigabyte heaps running on modern servers provide new challenges for garbage collection (GC). The challenges for "server-oriented" GC include: ensuring short pause times on a multi-gigabyte heap, while minimizing throughput penalty, good scaling on multiprocessor hardware, and keeping the number of expensive multi-cycle fence instructions required by weak ordering to a minimum. We designed and implemented a fully parallel, incremental, mostly concurrent collector, which employs several novel techniques to meet these challenges. First, it combines incremental GC to ensure short pause times with concurrent low-priority background GC threads to take advantage of processor idle time. Second, it employs a low-overhead work packet mechanism to enable full parallelism among the incremental and concurrent collecting threads and ensure load balancing. Third, it reduces memory fence instructions by using batching techniques: one fence for each block of small objects allocated, one fence for each group of objects marked, and no fence at all in the write barrier. When compared to the mature well-optimized parallel stop-the-world mark-sweep collector already in the IBM JVM, our collector prototype reduces the maximum pause time from 284 ms to 101 ms, and the average pause time from 266 ms to 66 ms while only losing 10% throughput when running the SPECjbb2000 benchmark on a 256 MB heap on a 4-way 550 MHz Pentium multiprocessor.