Algorithms for parallel memory allocation
International Journal of Parallel Programming
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Dynamic storage allocation on a multiprocessor
Dynamic storage allocation on a multiprocessor
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The memory fragmentation problem: solved?
Proceedings of the 1st international symposium on Memory management
Memory allocation for long-running server applications
Proceedings of the 1st international symposium on Memory management
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors
ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-MemoryMultiprocessors
Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-MemoryMultiprocessors
Non-compacting memory allocation and real-time garbage collection
Non-compacting memory allocation and real-time garbage collection
Properties of age-based automatic memory reclamation algorithms
Properties of age-based automatic memory reclamation algorithms
Sum-of-squares heuristics for bin packing and memory allocation
Journal of Experimental Algorithmics (JEA)
Stasis: flexible transactional storage
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Branch-and-Bound interval global optimization on shared memory multiprocessors
Optimization Methods & Software - THE JOINT EUROPT-OMS CONFERENCE ON OPTIMIZATION, 4-7 JULY, 2007, PRAGUE, CZECH REPUBLIC, PART I
Optimizing transactions for captured memory
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Memory management thread for heap allocation intensive sequential applications
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Flat combining and the synchronization-parallelism tradeoff
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Parallel operations of sparse polynomials on multicores: I. multiplication and Poisson bracket
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Accelerating I/O Forwarding in IBM Blue Gene/P Systems
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Localizing defects in multithreaded programs by mining dynamic call graphs
TAIC PART'10 Proceedings of the 5th international academic and industrial conference on Testing - practice and research techniques
Parallelization of module network structure learning and performance tuning on SMP
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces
Proceedings of the 2012 international symposium on Memory Management
Dynamically managed data for CPU-GPU architectures
Proceedings of the Tenth International Symposium on Code Generation and Optimization
ACDC: towards a universal mutator for benchmarking heap management systems
Proceedings of the 2013 international symposium on memory management
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Power-aware dynamic memory management on many-core platforms utilizing DVFS
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Revisiting memory management on virtualized environments
ACM Transactions on Architecture and Code Optimization (TACO)
Towards software performance engineering for multicore and manycore systems
ACM SIGMETRICS Performance Evaluation Review
KMA: A Dynamic Memory Manager for OpenCL
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.