Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scalable lock-free dynamic memory allocation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Practical, transparent operating system support for superpages
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Understanding The Linux Kernel
Understanding The Linux Kernel
"MAMA!": a memory allocator for multithreaded architectures
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition
Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition
The International Exascale Software Project roadmap
International Journal of High Performance Computing Applications
Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux
International Journal of High Performance Computing Applications
Scalable address spaces using RCU balanced trees
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Performance characteristics of explicit superpage support
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Hi-index | 0.00 |
Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.