Concurrency features for the Trellis/Owl language
European conference on object-oriented programming on ECOOP '87
Fast mutual exclusion for uniprocessors
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
A methodology for implementing highly concurrent data objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing atomic sequences on uniprocessors using rollforward
Software—Practice & Experience
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
Practical implementations of non-blocking synchronization primitives
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Lock-free data structures
Memory allocation for long-running server applications
Proceedings of the 1st international symposium on Memory management
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Atomic heap transactions and fine-grain interrupts
Proceedings of the fourth ACM SIGPLAN international conference on Functional programming
Solaris internals: core kernel architecture
Solaris internals: core kernel architecture
Cycles to recycle: garbage collection to the IA-64
Proceedings of the 2nd international symposium on Memory management
Experience with an efficient parallel kernel memory allocator
Software—Practice & Experience
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Improving scalability of multithreaded dynamic memory allocation
Dr. Dobb's Journal
Dynamic Storage Allocation: A Survey and Critical Review
IWMM '95 Proceedings of the International Workshop on Memory Management
Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors
ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Nonblocking synchronization and system design
Nonblocking synchronization and system design
malloc() performance in a multithreaded Linux environment
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Distributed Computing - Special issue: Selected papers from PODC '01
Scalable lock-free dynamic memory allocation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Bringing practical lock-free synchronization to 64-bit applications
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Nonblocking memory management support for dynamic-sized data structures
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Revocable locks for non-blocking programming
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Practice of parallelizing network applications on multi-core architectures
Proceedings of the 23rd international conference on Supercomputing
Supporting per-processor local-allocation buffers using multi-processor restartable critical sections
Simplifying concurrent algorithms by exploiting hardware transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
Dynamic adaptive scheduling for virtual machines
Proceedings of the 20th international symposium on High performance distributed computing
Allocating memory in a lock-free manner
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Lock cohorting: a general technique for designing NUMA locks
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Scalable SIMD-parallel memory allocation for many-core machines
The Journal of Supercomputing
KMA: A Dynamic Memory Manager for OpenCL
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment for the applications that use them, particularly for applications with large numbers of threads running on high-order multiprocessor systems.This paper introduces Multi-Processor Restartable Critical Sections, or MP-RCS. MP-RCS permits user-level threads to know precisely which processor they are executing on and then to safely manipulate CPU-specific data, such as malloc metadata, without locks or atomic instructions. MP-RCS avoids interference by using upcalls to notify user-level threads when preemption or migration has occurred. The upcall will abort and restart any interrupted critical sections.We use MP-RCS to implement a malloc package, LFMalloc (Lock-Free Malloc). LFMalloc is scalable, has extremely low latency, excellent cache characteristics, and is memory efficient. We present data from some existing benchmarks showing that LFMalloc is often 10 times faster than Hoard, another malloc replacement package.