Implementing stack simulation for highly-associative memories
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Improving Disk Cache Hit-Ratios Through Cache Partitioning
IEEE Transactions on Computers
Optimal Partitioning of Cache Memory
IEEE Transactions on Computers
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
OS-Controlled Cache Predictability for Real-Time Systems
RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
A resource allocation model for QoS management
RTSS '97 Proceedings of the 18th IEEE Real-Time Systems Symposium
Proceedings of the 30th annual international symposium on Computer architecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Dynamic tracking of page miss ratio curve for memory management
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
An analytical model for cache replacement policy performance
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Using Valgrind to detect undefined value errors with bit-precision
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Performance of multithreaded chip multiprocessors and implications for operating system design
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Ubiquitous memory introspection
Proceedings of the International Symposium on Code Generation and Optimization
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
JIT instrumentation: a novel approach to dynamically instrument operating systems
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Path: page access tracking to improve memory management
Proceedings of the 6th international symposium on Memory management
CRAMM: virtual memory support for garbage-collected applications
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dynamic partitioning of the cache hierarchy in shared data centers
Proceedings of the VLDB Endowment
Multi-optimization power management for chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Maximizing power efficiency with asymmetric multicore systems
Communications of the ACM - Finding the Fun in Computer Science Education
Managing contention for shared resources on multicore processors
Communications of the ACM
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Managing Contention for Shared Resources on Multicore Processors
Queue - Power Management
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Performance and power modeling in a multi-programmed multi-core environment
Proceedings of the 47th Design Automation Conference
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Generalized ERSS tree model: Revisiting working sets
Performance Evaluation
Contention-Aware Scheduling on Multicore Systems
ACM Transactions on Computer Systems (TOCS)
Journal of Parallel and Distributed Computing
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches
ACM Transactions on Architecture and Code Optimization (TACO)
Mind the gap: reconnecting architecture and OS research
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Proceedings of the international symposium on Memory management
Low cost working set size tracking
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches
ACM Transactions on Architecture and Code Optimization (TACO)
REEact: a customizable virtual execution manager for multicore platforms
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Toward predictable performance in software packet-processing platforms
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
Proceedings of the 26th ACM international conference on Supercomputing
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Reducing last level cache pollution in NUMA multicore systems for improving cache performance
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
A collaborative memory system for high-performance and cost-effective clustered architectures
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Responding rapidly to service level violations using virtual appliances
ACM SIGOPS Operating Systems Review
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Reuse-based online models for caches
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
ACM Transactions on Architecture and Code Optimization (TACO)
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
Virtual machine consolidation based on interference modeling
The Journal of Supercomputing
A performance-aware quality of service-driven scheduler for multicore processors
ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)
Hi-index | 0.02 |
Miss rate curves (MRCs) are useful in a number of contexts. In our research, online L2 cache MRCs enable us to dynamically identify optimal cache sizes when cache-partitioning a shared-cache multicore processor. Obtaining L2 MRCs has generally been assumed to be expensive when done in software and consequently, their usage for online optimizations has been limited. To address these problems and opportunities, we have developed a low-overhead software technique to obtain L2 MRCs online on current processors, exploiting features available in their performance monitoring units so that no changes to the application source code or binaries are required. Our technique, called RapidMRC, requires a single probing period of roughly 221 million processor cycles (147 ms), and subsequently 124 million cycles (83 ms) to process the data. We demonstrate its accuracy by comparing the obtained MRCs to the actual L2 MRCs of 30 applications taken from SPECcpu2006, SPECcpu2000, and SPECjbb2000. We show that RapidMRC can be applied to sizing cache partitions, helping to achieve performance improvements of up to 27%.