Algorithms for parallel memory allocation
International Journal of Parallel Programming
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Dynamic storage allocation on a multiprocessor
Dynamic storage allocation on a multiprocessor
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The memory fragmentation problem: solved?
Proceedings of the 1st international symposium on Memory management
Memory allocation for long-running server applications
Proceedings of the 1st international symposium on Memory management
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors
ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-MemoryMultiprocessors
Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-MemoryMultiprocessors
Non-compacting memory allocation and real-time garbage collection
Non-compacting memory allocation and real-time garbage collection
Properties of age-based automatic memory reclamation algorithms
Properties of age-based automatic memory reclamation algorithms
Composing high-performance memory allocators
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Proceedings of the 3rd international symposium on Memory management
Reconsidering custom memory allocation
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Improving server software support for simultaneous multithreaded processors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A C++ Pooled, Shared Memory Allocator for Simulator Development
ANSS '04 Proceedings of the 37th annual symposium on Simulation
Scalable lock-free dynamic memory allocation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Myths and realities: the performance impact of garbage collection
Proceedings of the joint international conference on Measurement and modeling of computer systems
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Performance Evaluation of Task Pools Based on Hardware Synchronization
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Garbage collection without paging
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The KaffeOS Java runtime system
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimistic intra-transaction parallelism on chip multiprocessors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A locality-improving dynamic memory allocator
Proceedings of the 2005 workshop on Memory system performance
"MAMA!": a memory allocator for multithreaded architectures
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-Malloc: a scalable transactional memory allocator
Proceedings of the 5th international symposium on Memory management
Scalable locality-conscious multithreaded memory allocation
Proceedings of the 5th international symposium on Memory management
DieHard: probabilistic memory safety for unsafe languages
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Comprehensively and efficiently protecting the heap
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 2006 workshop on Memory system performance and correctness
Transactions with isolation and cooperation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Journal of Systems and Software
Performance of memory reclamation for lockless synchronization
Journal of Parallel and Distributed Computing
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
A compacting real-time memory management system
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Experimenting with parallelism for the instantiation of ASP programs
Journal of Algorithms
Memory Allocation Tracing with VampirTrace
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
High Level Thread-Based Competitive Or-Parallelism in Logtalk
PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
First-aid: surviving and preventing memory management bugs during production runs
Proceedings of the 4th ACM European conference on Computer systems
A study of memory management for web-based applications on multicore processors
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Journal of Parallel and Distributed Computing
SPARTAN: A software tool for Parallelization Bottleneck Analysis
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Software Transactional Memory Service for Grids
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Grace: safe multithreaded programming for C/C++
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack
Proceedings of the 5th European conference on Computer systems
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Parallelizing tableaux-based description logic reasoning
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Parallelization of bulk operations for STL dictionaries
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Z-rays: divide arrays and conquer speed and flexibility
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Simplifying concurrent algorithms by exploiting hardware transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Optimal resource management for a model driven LTE protocol stack on a multicore platform
Proceedings of the 8th ACM international workshop on Mobility management and wireless access
Mnemosyne: lightweight persistent memory
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Optimizing hybrid transactional memory: the importance of nonspeculative operations
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A highly-efficient wait-free universal construction
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
SecureME: a hardware-software approach to full system security
Proceedings of the international conference on Supercomputing
Parallelization of the Lanczos algorithm on multi-core platforms
ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Dthreads: efficient deterministic multithreading
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
SHERIFF: precise detection and automatic mitigation of false sharing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Fast and scalable rendezvousing
DISC'11 Proceedings of the 25th international conference on Distributed computing
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Allocating memory in a lock-free manner
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Optimizing c multithreaded memory management using thread-local storage
CC'05 Proceedings of the 14th international conference on Compiler Construction
Revisiting the combining synchronization technique
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
EASE'06 Proceedings of the 10th international conference on Evaluation and Assessment in Software Engineering
Parallel memory defragmentation on a GPU
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Memory management for many-core processors with software configurable locality policies
Proceedings of the 2012 international symposium on Memory Management
Delegation and nesting in best-effort hardware transactional memory
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Memory-mapping support for reducer hyperobjects
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Using managed runtime systems to tolerate holes in wearable memories
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the ACM International Conference on Computing Frontiers
DRASync: distributed region-based memory allocation and synchronization
Proceedings of the 20th European MPI Users' Group Meeting
Introducing kernel-level page reuse for high performance computing
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
OOPSLA 2002: Reconsidering custom memory allocation
ACM SIGPLAN Notices - Supplemental issue
Scalable SIMD-parallel memory allocation for many-core machines
The Journal of Supercomputing
PREDATOR: predictive false sharing detection
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient deterministic multithreading without global barriers
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Log-structured memory for DRAM-based storage
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.