PSMalloc: content based memory management for MPI applications

  • Authors:
  • Susmit Biswas;Diana Franklin;Timothy Sherwood;Frederic T. Chong;Bronis R. de Supinski;Martin Schulz

  • Affiliations:
  • University of California, Santa Barbara;University of California, Santa Barbara;University of California, Santa Barbara;University of California, Santa Barbara;Lawrence Livermore National Laboratory;Lawrence Livermore National Laboratory

  • Venue:
  • Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore processors have come to dominate the commodity market upon which many large scale systems are based. The number of cores is increasing with the speed of Moore's law and as a direct consequence, the memory available per core is decreasing, often severely limiting the problem size for programs running on such platforms. Thus, mechanisms to store memory efficiently in DRAM, increasing the effective capacity of DRAM, in a way that requires no reprogramming, would dramatically increase the benefits of multicore nodes for large scale systems. We observe that MPI programs replicate a significant amount of data across all processes. With multiple MPI tasks running on a single node, this replication leads to identical data residing in multiple locations in that node's DRAM, an ideal candidate for potential savings. We have found that most of the redundant data resides in the heap. Thus, smart memory allocation can remove this redundancy and increase the effective memory capacity. We present PSMalloc, a memory allocation library that keeps a single copy of identical pages from a set of MPI tasks. PSMalloc is implemented as a user level library that can be linked at runtime, avoiding changes in the application or the operating system. To the best of our knowledge, our work is the first that reduces physical memory footprints of MPI tasks in a multicore node without requiring kernel level modifications. We experiment with four MPI applications from the ASC Sequoia benchmark suite and show that we can achieve a reduction in memory footprint up to 22% and 11.18% in average.