Should we worry about memory loss?

Authors:
O. Perks;S. D. Hammond;S. J. Pennycook;S. A. Jarvis
Affiliations:
University of Warwick, UK;University of Warwick, UK;University of Warwick, UK;University of Warwick, UK
Venue:
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Year:
2011

Citing 9
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
A Case for Intelligent RAM

IEEE Micro
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
PSMalloc: content based memory management for MPI applications

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Embedded DRAM: technology platform for the Blue Gene/L chip

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years the High Performance Computing (HPC) industry has benefited from the development of higher density multi-core processors. With recent chips capable of executing up to 32 tasks in parallel, this rate of growth also shows no sign of slowing. Alongside the development of denser micro-processors has been the considerably more modest rate of improvement in random access memory (RAM). The effect has been that the available memory-per-core has reduced and current projections suggest that this is set to reduce still further. In this paper we present three studies into the use and measurement of memory in parallel applications; our aim is to capture, understand and, if possible, reduce the memory-per-core needed by complete multi-component applications. First, we present benchmarked memory usage and runtimes of a six scientific benchmarks, which represent algorithms that are common to a host of production-grade codes. Memory usage of each benchmark is measured and reported for a variety of compiler toolkits, and we show 30% variation in memory high-water mark requirements between compilers. Second, we utilise this benchmark data combined with runtime data, to simulate via the Maui scheduler simulator, the effect on a multi-science workow if memory-per-core is reduced from 1.5GB-per-core to only 256MB. Finally, we present initial results from a new memory profiling tool currently in development at the University of Warwick. This tool is applied to a finite-element benchmark and is able to map high-water-mark memory allocations to individual program functions. This demonstrates a lightweight and accurate method of identifying potential memory problems, a technique we expect to become commonplace as memory capacities decrease.