Towards practical page coloring-based multicore cache management
Proceedings of the 4th ACM European conference on Computer systems
A light-weight fairness mechanism for chip multiprocessor memory systems
Proceedings of the 6th ACM conference on Computing frontiers
Rate-based QoS techniques for cache/memory in CMP platforms
Proceedings of the 23rd international conference on Supercomputing
Virtual platform architectures for resource metering in datacenters
ACM SIGMETRICS Performance Evaluation Review
VM3: Measuring, modeling and managing VM shared resources
Computer Networks: The International Journal of Computer and Telecommunications Networking
Modeling virtual machine performance: challenges and approaches
ACM SIGMETRICS Performance Evaluation Review
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
Proceedings of the Workshop on Binary Instrumentation and Applications
FaReS: Fair Resource Scheduling for VMM-Bypass InfiniBand Devices
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Hardware execution throttling for multi-core resource management
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Supervised learning based power management for multicore processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
Architectural support for thread communications in multi-core processors
Parallel Computing
Directly characterizing cross core interference through contention synthesis
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches
ACM Transactions on Architecture and Code Optimization (TACO)
DeFT: Design space exploration for on-the-fly detection of coherence misses
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 2nd ACM Symposium on Cloud Computing
A high performance adaptive miss handling architecture for chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Optimizing datacenter power with memory system levers for guaranteed quality-of-service
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A Machine Learning Based Meta-Scheduler for Multi-Core Processors
International Journal of Adaptive, Resilient and Autonomic Systems
CPI2: CPU performance isolation for shared compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
PCASA: probabilistic control-adjusted selective allocation for shared caches
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Impact of resource sharing on performance and performance prediction: a survey
CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory
Hi-index | 0.00 |
As multi-core architectures flourish in the marketplace, multi-application workload scenarios (such as server consolidation) are growing rapidly. When running multiple applications simultaneously on a platform, it has been shown that contention for shared platform resources such as last-level cache can severely degrade performance and quality of service (QoS). But today's platforms do not have the capability to monitor shared cache usage accurately and disambiguate its effects on the performance behavior of each individual application. In this paper, we investigate low-overhead mechanisms for fine-grain monitoring of the use of shared cache resources along three vectors: (a) occupancy -- how much space is being used and by whom, (b) interference -- how much contention is present and who is being affected and (c) sharing -- how are threads cooperating. We propose the CacheScouts monitoring architecture consisting of novel tagging (software-guided monitoring IDs), and sampling mechanisms (set sampling) to achieve shared cache monitoring on per application basis at low overhead (\le 0.1%) and with very little loss of accuracy (\le 5%). We also present case studies to show how CacheScouts can be used by operating systems (OS) and virtual machine monitors (VMMs) for (a) characterizing execution profiles, (b) optimizing scheduling for performance management, (c) providing QoS and (d) metering for chargeback.