Mostly lock-free malloc

Authors:
Dave Dice;Alex Garthwaite
Affiliations:
Sun Microsystems, Inc., Burlington, MA;Sun Microsystems Laboratories, Burlington, MA
Venue:
Proceedings of the 3rd international symposium on Memory management
Year:
2002

Citing 21
Cited 18

Concurrency features for the Trellis/Owl language

European conference on object-oriented programming on ECOOP '87
Fast mutual exclusion for uniprocessors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
A methodology for implementing highly concurrent data objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing atomic sequences on uniprocessors using rollforward

Software—Practice & Experience
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
Practical implementations of non-blocking synchronization primitives

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Lock-free data structures

Lock-free data structures
Memory allocation for long-running server applications

Proceedings of the 1st international symposium on Memory management
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Atomic heap transactions and fine-grain interrupts

Proceedings of the fourth ACM SIGPLAN international conference on Functional programming
Solaris internals: core kernel architecture

Solaris internals: core kernel architecture
Cycles to recycle: garbage collection to the IA-64

Proceedings of the 2nd international symposium on Memory management
Experience with an efficient parallel kernel memory allocator

Software—Practice & Experience
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Improving scalability of multithreaded dynamic memory allocation

Dr. Dobb's Journal
Dynamic Storage Allocation: A Survey and Critical Review

IWMM '95 Proceedings of the International Workshop on Memory Management
Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors

ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Nonblocking synchronization and system design

Nonblocking synchronization and system design
malloc() performance in a multithreaded Linux environment

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference

Lock-free reference counting

Distributed Computing - Special issue: Selected papers from PODC '01
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Bringing practical lock-free synchronization to 64-bit applications

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Nonblocking memory management support for dynamic-sized data structures

ACM Transactions on Computer Systems (TOCS)
Supporting per-processor local-allocation buffers using lightweight user-level preemption notification

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Revocable locks for non-blocking programming

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Practice of parallelizing network applications on multi-core architectures

Proceedings of the 23rd international conference on Supercomputing
Supporting per-processor local-allocation buffers using multi-processor restartable critical sections

Supporting per-processor local-allocation buffers using multi-processor restartable critical sections
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
MapCG: writing parallel program portable between CPU and GPU

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Cache index-aware memory allocation

Proceedings of the international symposium on Memory management
Dynamic adaptive scheduling for virtual machines

Proceedings of the 20th international symposium on High performance distributed computing
Allocating memory in a lock-free manner

ESA'05 Proceedings of the 13th annual European conference on Algorithms
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Scalable statistics counters

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Scalable SIMD-parallel memory allocation for many-core machines

The Journal of Supercomputing
A hierarchical parallel discrete event simulation kernel for multicore platform

Cluster Computing
KMA: A Dynamic Memory Manager for OpenCL

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment for the applications that use them, particularly for applications with large numbers of threads running on high-order multiprocessor systems.This paper introduces Multi-Processor Restartable Critical Sections, or MP-RCS. MP-RCS permits user-level threads to know precisely which processor they are executing on and then to safely manipulate CPU-specific data, such as malloc metadata, without locks or atomic instructions. MP-RCS avoids interference by using upcalls to notify user-level threads when preemption or migration has occurred. The upcall will abort and restart any interrupted critical sections.We use MP-RCS to implement a malloc package, LFMalloc (Lock-Free Malloc). LFMalloc is scalable, has extremely low latency, excellent cache characteristics, and is memory efficient. We present data from some existing benchmarks showing that LFMalloc is often 10 times faster than Hoard, another malloc replacement package.