Scalable locality-conscious multithreaded memory allocation

Authors:
Scott Schneider;Christos D. Antonopoulos;Dimitrios S. Nikolopoulos
Affiliations:
College of William and Mary;College of William and Mary;College of William and Mary
Venue:
Proceedings of the 5th international symposium on Memory management
Year:
2006

Citing 21
Cited 18

Improving the cache locality of memory allocation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Using lifetime predictors to improve memory allocation performance

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Memory management with explicit regions

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Memory allocation for long-running server applications

Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The AED free storage package

Communications of the ACM
A fast storage allocator

Communications of the ACM
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Exploiting prolific types for memory management and optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reconsidering custom memory allocation

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Dynamic Storage Allocation: A Survey and Critical Review

IWMM '95 Proceedings of the International Workshop on Memory Management
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors

ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Practical, transparent operating system support for superpages

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Multiple Page Size Modeling and Optimization

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A locality-improving dynamic memory allocator

Proceedings of the 2005 workshop on Memory system performance
The Art of Computer Programming, Volume 1, Fascicle 1: MMIX -- A RISC Computer for the New Millennium (Art of Computer Programming)

The Art of Computer Programming, Volume 1, Fascicle 1: MMIX -- A RISC Computer for the New Millennium (Art of Computer Programming)
Malloc(3) revisited

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

Algorithm, software, and hardware optimizations for Delaunay mesh generation on simultaneous multithreaded architectures

Journal of Parallel and Distributed Computing
Memory management thread for heap allocation intensive sequential applications

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Lock-free parallel dynamic programming

Journal of Parallel and Distributed Computing
Parallel operations of sparse polynomials on multicores: I. multiplication and Poisson bracket

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Scalable address spaces using RCU balanced trees

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Cache craftiness for fast multicore key-value storage

Proceedings of the 7th ACM european conference on Computer Systems
Revisiting software zero-copy for web-caching applications with twin memory allocation

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
SSMalloc: a low-latency, locality-conscious memory allocator with stable performance scalability

Proceedings of the Asia-Pacific Workshop on Systems
SSMalloc: a low-latency, locality-conscious memory allocator with stable performance scalability

APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
ACDC: towards a universal mutator for benchmarking heap management systems

Proceedings of the 2013 international symposium on memory management
RadixVM: scalable address spaces for multithreaded applications

Proceedings of the 8th ACM European Conference on Computer Systems
DRASync: distributed region-based memory allocation and synchronization

Proceedings of the 20th European MPI Users' Group Meeting
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Towards software performance engineering for multicore and manycore systems

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance memory allocation while transparently favoring locality. Streamflow enables low over-head simultaneous allocation by multiple threads and adapts to sequential allocation at speeds comparable to that of custom sequential allocators. It favors the transparent exploitation of temporal and spatial object access locality, and reduces allocator-induced cache conflicts and false sharing, all using a unified design based on segregated heaps. Streamflow introduces an innovative design which uses only synchronization-free operations in the most common case of local allocations and deallocations, while requiring minimal, non-blocking synchronization in the less common case of remote deallocations. Spatial locality at the cache and page level is favoredby eliminating small objects headers, reducing allocator-induced conflicts via contiguous allocation of page blocks in physical memory, reducing allocator-induced false sharing by using segregated heaps and achieving better TLB performance and fewer page faults via the use of superpages. Combining these locality optimizations with the drastic reduction of synchronization and latency overhead allows Streamflow to perform comparably with optimized sequential allocators and outperform--on a shared-memory systemwith four two-way SMT processors--four state-of-the-art multi-processor allocators by sizeable margins in our experiments. The allocation-intensive sequential and parallel benchmarks used in our experiments represent a variety of behaviors, including mostly local object allocation-deallocation patterns and producer-consumer allocation-deallocation patterns.