The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces

Authors:
Spyros Lyberis;Polyvios Pratikakis;Dimitrios S. Nikolopoulos;Martin Schulz;Todd Gamblin;Bronis R. de Supinski
Affiliations:
Foundation for Research and Technology, Heraklion, Greece;Foundation for Research and Technology, Heraklion, Greece;Foundation for Research and Technology, Heraklion, Greece;Lawrence Livermore National Laboratory, Livermore, CA, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA
Venue:
Proceedings of the 2012 international symposium on Memory Management
Year:
2012

Citing 24
Cited 1

Fast allocation and deallocation of memory based on object lifetimes

Software—Practice & Experience
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Region-based memory management

Information and Computation
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The memory fragmentation problem: solved?

Proceedings of the 1st international symposium on Memory management
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Hoard: a scalable memory allocator for multithreaded applications

ACM SIGPLAN Notices
Language support for regions

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Reconsidering custom memory allocation

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Memory management with explicit regions

Memory management with explicit regions
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
"MAMA!": a memory allocator for multithreaded architectures

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-Malloc: a scalable transactional memory allocator

Proceedings of the 5th international symposium on Memory management
The Problem with Threads

Computer
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The slab allocator: an object-caching kernel memory allocator

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Decoupling method for parallel delaunay two-dimensional mesh generation

Decoupling method for parallel delaunay two-dimensional mesh generation
The Design of OpenMP Tasks

IEEE Transactions on Parallel and Distributed Systems
A programming model for deterministic task parallelism

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
A performance model for X10 applications: what's going on under the hood?

Proceedings of the 2011 ACM SIGPLAN X10 Workshop

DRASync: distributed region-based memory allocation and synchronization

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

Constantly increasing hardware parallelism poses more and more challenges to programmers and language designers. One approach to harness the massive parallelism is to move to task-based programming models that rely on runtime systems for dependency analysis and scheduling. Such models generally benefit from the existence of a global address space. This paper presents the parallel memory allocator of the Myrmics runtime system, in which multiple allocator instances organized in a tree hierarchy cooperate to implement a global address space with dynamic region support on distributed memory machines. The Myrmics hierarchical memory allocator is step towards improved productivity and performance in parallel programming. Productivity is improved through the use of dynamic regions in a global address space, which provide a convenient shared memory abstraction for dynamic and irregular data structures. Performance is improved through scaling on manycore systems without system-wide cache coherency. We evaluate the stand-alone allocator on an MPI-based x86 cluster and find that it scales well for up to 512 worker cores, while it can outperform Unified Parallel C by a factor of 3.7-10.7x.