Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses

Authors:
Andreas Sandberg;David Eklöv;Erik Hagersten
Affiliations:
-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 11
Cited 6

A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Using the Compiler to Improve Cache Replacement Decisions

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Evaluation techniques for storage hierarchies

IBM Systems Journal
Instruction-based reuse-distance prediction for effective cache management

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture

Memory system performance in a NUMA multicore multiprocessor

Proceedings of the 4th Annual International Conference on Systems and Storage
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management
Reducing last level cache pollution through OS-level software-controlled region-based partitioning

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Contention for shared cache resources has been recognized as a major bottleneck for multicores--especially for mixed workloads of independent applications. While most modern processors implement instructions to manage caches, these instructions are largely unused due to a lack of understanding of how to best leverage them. This paper introduces a classification of applications into four cache usage categories. We discuss how applications from different categories affect each other's performance indirectly through cache sharing and devise a scheme to optimize such sharing. We also propose a low-overhead method to automatically find the best per-instruction cache management policy. We demonstrate how the indirect cache-sharing effects of mixed workloads can be tamed by automatically altering some instructions to better manage cache resources. Practical experiments demonstrate that our software-only method can improve application performance up to 35% on x86 multicore hardware.