Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
IEEE Micro
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
PSMalloc: content based memory management for MPI applications
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Embedded DRAM: technology platform for the Blue Gene/L chip
IBM Journal of Research and Development
Hi-index | 0.00 |
In recent years the High Performance Computing (HPC) industry has benefited from the development of higher density multi-core processors. With recent chips capable of executing up to 32 tasks in parallel, this rate of growth also shows no sign of slowing. Alongside the development of denser micro-processors has been the considerably more modest rate of improvement in random access memory (RAM). The effect has been that the available memory-per-core has reduced and current projections suggest that this is set to reduce still further. In this paper we present three studies into the use and measurement of memory in parallel applications; our aim is to capture, understand and, if possible, reduce the memory-per-core needed by complete multi-component applications. First, we present benchmarked memory usage and runtimes of a six scientific benchmarks, which represent algorithms that are common to a host of production-grade codes. Memory usage of each benchmark is measured and reported for a variety of compiler toolkits, and we show 30% variation in memory high-water mark requirements between compilers. Second, we utilise this benchmark data combined with runtime data, to simulate via the Maui scheduler simulator, the effect on a multi-science workow if memory-per-core is reduced from 1.5GB-per-core to only 256MB. Finally, we present initial results from a new memory profiling tool currently in development at the University of Warwick. This tool is applied to a finite-element benchmark and is able to map high-water-mark memory allocations to individual program functions. This demonstrates a lightweight and accurate method of identifying potential memory problems, a technique we expect to become commonplace as memory capacities decrease.