Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Application-controlled physical memory using external page-cache management
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Visualization Handbook
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Linux Device Drivers, 3rd Edition
Linux Device Drivers, 3rd Edition
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Binary Mesh Partitioning for Cache-Efficient Visualization
IEEE Transactions on Visualization and Computer Graphics
The International Exascale Software Project roadmap
International Journal of High Performance Computing Applications
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
This paper discusses the use of software cache partitioning techniques to study and improve cache behavior of HPC applications. Most existing studies use this partitioning to solve quality of service issues, like fair distribution of a shared cache among running processes. We believe that, in the HPC context of a single application being studied/optimized on the system, with a single thread per core, cache partitioning can be used in new and interesting ways. First, we propose an implementation of software cache partitioning using the well known page coloring technique. This implementation differs from existing ones by giving control of the partitioning to the application programmer. Developed on the most popular OS in HPC (Linux), this cache control scheme has low overhead both in memory and CPU while being simple to use. Second, we illustrate how this user-controlled cache partitioning can lead to efficient measurements of cache behavior of a parallel scientific visualization application. While current tools require expensive binary instrumentation of an application to obtain its working sets, our method only needs a few unmodified runs on the target platform. Finally, we discuss the use of our scheme to optimize memory intensive applications by isolating each of their critical data structures into dedicated cache partitions. This isolation allows the analysis of each structure cache requirements and leads to new and significant optimization strategies. To the best of our knowledge, no other existing tool enables such tuning of HPC applications.