Enabling software management for multicore caches with a lightweight hardware support

Authors:
Jiang Lin;Qingda Lu;Xiaoning Ding;Zhao Zhang;Xiaodong Zhang;P. Sadayappan
Affiliations:
Iowa State University, Ames, IA;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Iowa State University, Ames, IA;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 19
Cited 5

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

IEEE Transactions on Computers
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
W-Order scan: minimizing cache pollution by application software level cache management for MMDB

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Reducing last level cache pollution through OS-level software-controlled region-based partitioning

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The management of shared caches in multicore processors is a critical and challenging task. Many hardware and OS-based methods have been proposed. However, they may be hardly adopted in practice due to their non-trivial overheads, high complexities, and/or limited abilities to handle increasingly complicated scenarios of cache contention caused by many-cores. In order to turn cache partitioning methods into reality in the management of multicore processors, we propose to provide an affordable and lightweight hardware support to coordinate with OS-based cache management policies. The proposed methods are scalable to many-cores, and perform comparably with other proposed hardware solutions, but have much lower overheads, therefore can be easily adopted in commodity processors. Having conducted extensive experiments with 37 multi-programming workloads, we show the effectiveness and scalability of the proposed methods. For example on 8-core systems, one of our proposed policies improves performance over LRU-based hardware cache management by 14.5% on average.