Garbage collection for multicore NUMA machines

Authors:
Sven Auhagen;Lars Bergstrom;Matthew Fluet;John Reppy
Affiliations:
University of Chicago;University of Chicago;Rochester Institute of Technolgy;University of Chicago
Venue:
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Year:
2011

Citing 13
Cited 2

Simple generational garbage collection and fast allocation

Software—Practice & Experience
A concurrent, generational garbage collector for a multithreaded implementation of ML

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Portable, unobtrusive garbage collection for multiprocessor systems

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Programming with POSIX threads

Programming with POSIX threads
The Definition of Standard ML

The Definition of Standard ML
Manticore: a heterogeneous parallel language

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Status report: the manticore project

ML '07 Proceedings of the 2007 workshop on Workshop on ML
Implicitly-threaded parallelism in Manticore

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Runtime support for multicore Haskell

Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Parallel concurrent ML

Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
Optimizations in a private nursery-based garbage collector

Proceedings of the 2010 international symposium on Memory management
Effective scheduling techniques for high-level parallel programming languages

Effective scheduling techniques for high-level parallel programming languages

Eliminating read barriers through procrastination and cleanliness

Proceedings of the 2012 international symposium on Memory Management
The manticore project

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. This bandwidth limitation has traditionally limited the scalability of modern functional language implementations, which seldom scale well past 8 cores, even on small benchmarks. This work presents a garbage collector integrated with our strict, parallel functional language implementation, Manticore, and shows that it scales effectively on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine.