RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Authors:
David K. Tam;Reza Azimi;Livio B. Soares;Michael Stumm
Affiliations:
University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada
Venue:
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Year:
2009

Citing 41
Cited 36

Implementing stack simulation for highly-associative memories

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Improving Disk Cache Hit-Ratios Through Cache Partitioning

IEEE Transactions on Computers
Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
OS-Controlled Cache Predictability for Real-Time Systems

RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
A resource allocation model for QoS management

RTSS '97 Proceedings of the 18th IEEE Real-Time Systems Symposium
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Dynamic tracking of page miss ratio curve for memory management

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
An analytical model for cache replacement policy performance

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Using Valgrind to detect undefined value errors with bit-precision

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Ubiquitous memory introspection

Proceedings of the International Symposium on Code Generation and Optimization
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
JIT instrumentation: a novel approach to dynamically instrument operating systems

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Path: page access tracking to improve memory management

Proceedings of the 6th international symposium on Memory management
CRAMM: virtual memory support for garbage-collected applications

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dynamic partitioning of the cache hierarchy in shared data centers

Proceedings of the VLDB Endowment
Multi-optimization power management for chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

Enhancing operating system support for multicore processors by using hardware performance monitoring

ACM SIGOPS Operating Systems Review
Maximizing power efficiency with asymmetric multicore systems

Communications of the ACM - Finding the Fun in Computer Science Education
Managing contention for shared resources on multicore processors

Communications of the ACM
Maximizing Power Efficiency with Asymmetric Multicore Systems

Queue - DNS
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Managing Contention for Shared Resources on Multicore Processors

Queue - Power Management
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Performance and power modeling in a multi-programmed multi-core environment

Proceedings of the 47th Design Automation Conference
On mitigating memory bandwidth contention through bandwidth-aware scheduling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Generalized ERSS tree model: Revisiting working sets

Performance Evaluation
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems

Journal of Parallel and Distributed Computing
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
Mind the gap: reconnecting architecture and OS research

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management
Low cost working set size tracking

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches

ACM Transactions on Architecture and Code Optimization (TACO)
REEact: a customizable virtual execution manager for multicore platforms

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Toward predictable performance in software packet-processing platforms

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Phase guided profiling for fast cache modeling

Proceedings of the Tenth International Symposium on Code Generation and Optimization
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures

Proceedings of the 26th ACM international conference on Supercomputing
Cache Conscious Task Regrouping on Multicore Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Reducing last level cache pollution in NUMA multicore systems for improving cache performance

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
A collaborative memory system for high-performance and cost-effective clustered architectures

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
When less is more (LIMO):controlled parallelism forimproved efficiency

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Responding rapidly to service level violations using virtual appliances

ACM SIGOPS Operating Systems Review
HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Proceedings of the 40th Annual International Symposium on Computer Architecture
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)
On modeling contention for shared caches in multi-core processors with techniques from ecology

Natural Computing: an international journal
Virtual machine consolidation based on interference modeling

The Journal of Supercomputing
A performance-aware quality of service-driven scheduler for multicore processors

ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Miss rate curves (MRCs) are useful in a number of contexts. In our research, online L2 cache MRCs enable us to dynamically identify optimal cache sizes when cache-partitioning a shared-cache multicore processor. Obtaining L2 MRCs has generally been assumed to be expensive when done in software and consequently, their usage for online optimizations has been limited. To address these problems and opportunities, we have developed a low-overhead software technique to obtain L2 MRCs online on current processors, exploiting features available in their performance monitoring units so that no changes to the application source code or binaries are required. Our technique, called RapidMRC, requires a single probing period of roughly 221 million processor cycles (147 ms), and subsequently 124 million cycles (83 ms) to process the data. We demonstrate its accuracy by comparing the obtained MRCs to the actual L2 MRCs of 30 applications taken from SPECcpu2006, SPECcpu2000, and SPECjbb2000. We show that RapidMRC can be applied to sizing cache partitions, helping to achieve performance improvements of up to 27%.