Introducing kernel-level page reuse for high performance computing

Authors:
Sébastien Valat;Marc Pérache;William Jalby
Affiliations:
CEA, DAM, DIF, Arpajon, France;CEA, DAM, DIF, Arpajon, France;Université de Versailles, Saint-Quentin, Versailles, France
Venue:
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Year:
2013

Citing 12
Cited 0

Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Practical, transparent operating system support for superpages

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Understanding The Linux Kernel

Understanding The Linux Kernel
"MAMA!": a memory allocator for multithreaded architectures

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition

Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition
The International Exascale Software Project roadmap

International Journal of High Performance Computing Applications
Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux

International Journal of High Performance Computing Applications
Scalable address spaces using RCU balanced trees

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Performance characteristics of explicit superpage support

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.