Controlling cache utilization of HPC applications

Authors:
Swann Perarnau;Marc Tchiboukdjian;Guillaume Huard
Affiliations:
Grenoble University, Grenoble, France;Grenoble University, Grenoble, France;Grenoble University, Grenoble, France
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 19
Cited 1

Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Application-controlled physical memory using external page-cache management

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Exokernel: an operating system architecture for application-level resource management

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Visualization Handbook

Visualization Handbook
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Linux Device Drivers, 3rd Edition

Linux Device Drivers, 3rd Edition
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Binary Mesh Partitioning for Cache-Efficient Visualization

IEEE Transactions on Visualization and Computer Graphics
The International Exascale Software Project roadmap

International Journal of High Performance Computing Applications

CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the use of software cache partitioning techniques to study and improve cache behavior of HPC applications. Most existing studies use this partitioning to solve quality of service issues, like fair distribution of a shared cache among running processes. We believe that, in the HPC context of a single application being studied/optimized on the system, with a single thread per core, cache partitioning can be used in new and interesting ways. First, we propose an implementation of software cache partitioning using the well known page coloring technique. This implementation differs from existing ones by giving control of the partitioning to the application programmer. Developed on the most popular OS in HPC (Linux), this cache control scheme has low overhead both in memory and CPU while being simple to use. Second, we illustrate how this user-controlled cache partitioning can lead to efficient measurements of cache behavior of a parallel scientific visualization application. While current tools require expensive binary instrumentation of an application to obtain its working sets, our method only needs a few unmodified runs on the target platform. Finally, we discuss the use of our scheme to optimize memory intensive applications by isolating each of their critical data structures into dedicated cache partitions. This isolation allows the analysis of each structure cache requirements and leads to new and significant optimization strategies. To the best of our knowledge, no other existing tool enables such tuning of HPC applications.