Predicting Cache Space Contention in Utility Computing Servers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An analytical model for cache replacement policy performance
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CMP cache performance projection: accessibility vs. capacity
ACM SIGARCH Computer Architecture News
From chaos to QoS: case studies in CMP resource management
ACM SIGARCH Computer Architecture News
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS
ACM SIGARCH Computer Architecture News
Performance/area efficiency in chip multiprocessors with micro-caches
Proceedings of the 4th international conference on Computing frontiers
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
All-window profiling of concurrent executions
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A dynamically reconfigurable cache for multithreaded processors
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Memory hierarchy performance measurement of commercial dual-core desktop processors
Journal of Systems Architecture: the EUROMICRO Journal
Distilling the essence of proprietary workloads into miniature benchmarks
ACM Transactions on Architecture and Code Optimization (TACO)
Exploration of the Influence of Program Inputs on CMP Co-scheduling
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dynamic memory balancing for virtual machines
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Modeling of cache access behavior based on Zipf's law
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
Proceedings of the 6th ACM conference on Computing frontiers
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Rate-based QoS techniques for cache/memory in CMP platforms
Proceedings of the 23rd international conference on Supercomputing
Push-assisted migration of real-time tasks in multi-core processors
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors
Journal of Signal Processing Systems
Dynamic memory balancing for virtual machines
ACM SIGOPS Operating Systems Review
Cache-aware scheduling and analysis for multicores
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Virtual platform architectures for resource metering in datacenters
ACM SIGMETRICS Performance Evaluation Review
Managing contention for shared resources on multicore processors
Communications of the ACM
A case for integrated processor-cache partitioning in chip multiprocessors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Resource management for isolation enhanced cloud services
Proceedings of the 2009 ACM workshop on Cloud computing security
VM3: Measuring, modeling and managing VM shared resources
Computer Networks: The International Journal of Computer and Telecommunications Networking
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the resource-sharing levels in the UltraSPARC T2 processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread to strand binding of parallel network applications in massive multi-threaded systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache partitioning for energy-efficient and interference-free embedded multitasking
ACM Transactions on Embedded Computing Systems (TECS)
Managing Contention for Shared Resources on Multicore Processors
Queue - Power Management
Modeling virtual machine performance: challenges and approaches
ACM SIGMETRICS Performance Evaluation Review
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors
Proceedings of the 5th European conference on Computer systems
Contention aware execution: online contention detection and response
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
PIRATE: QoS and performance management in CMP architectures
ACM SIGMETRICS Performance Evaluation Review
qTLB: looking inside the look-aside buffer
HiPC'07 Proceedings of the 14th international conference on High performance computing
MLP-aware dynamic cache partitioning
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Proceedings of the Workshop on Binary Instrumentation and Applications
ScaleUPC: a UPC compiler for multi-core systems
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
Performance and power modeling in a multi-programmed multi-core environment
Proceedings of the 47th Design Automation Conference
Accelerating multicore reuse distance analysis with sampling and parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Memory-aware scheduling for energy efficiency on multicore processors
HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Contention-Aware Scheduling on Multicore Systems
ACM Transactions on Computer Systems (TOCS)
Quality of service shared cache management in chip multiprocessor architecture
ACM Transactions on Architecture and Code Optimization (TACO)
An efficient simulation algorithm for cache of random replacement policy
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Journal of Parallel and Distributed Computing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proposal and evaluation of APIs for utilizing inter-core time aggregation scheduler
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Directly characterizing cross core interference through contention synthesis
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs
Journal of Parallel and Distributed Computing
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
A majority-based control scheme for way-adaptable caches
Facing the multicore-challenge
Dynamic cache partitioning based on the MLP of cache misses
Transactions on high-performance embedded architectures and compilers III
Power-aware dynamic cache partitioning for CMPs
Transactions on high-performance embedded architectures and compilers III
A majority-based control scheme for way-adaptable caches
Facing the multicore-challenge
Proceedings of the international symposium on Memory management
METE: meeting end-to-end QoS in multicores through system-wide resource management
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying inter-core data reuse in multicores
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Loaf: a framework and infrastructure for creating online adaptive solutions
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
METE: meeting end-to-end QoS in multicores through system-wide resource management
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying inter-core data reuse in multicores
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
FACT: a framework for adaptive contention-aware thread migrations
Proceedings of the 8th ACM International Conference on Computing Frontiers
A helper thread based dynamic cache partitioning scheme for multithreaded applications
Proceedings of the 48th Design Automation Conference
W-Order scan: minimizing cache pollution by application software level cache management for MMDB
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Proceedings of the 2nd ACM Symposium on Cloud Computing
Improving shared cache behavior of multithreaded object-oriented applications in multicores
Proceedings of the International Conference on Computer-Aided Design
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
CRUISE: cache replacement and utility-aware scheduling
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Region scheduling: efficiently using the cache architectures via page-level affinity
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Preventing denial-of-service attacks in shared CMP caches
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Scalable shared-cache management by containing thrashing workloads
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Is reuse distance applicable to data locality analysis on chip multiprocessors?
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Can linear approximation improve performance prediction ?
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
On the accuracy of cache sharing models
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Machine learning based performance prediction for multi-core simulation
MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Probabilistic modeling for job symbiosis scheduling on SMT processors
ACM Transactions on Architecture and Code Optimization (TACO)
Reuse distance based performance modeling and workload mapping
Proceedings of the 9th conference on Computing Frontiers
Toward predictable performance in software packet-processing platforms
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Reducing last level cache pollution through OS-level software-controlled region-based partitioning
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
An experimental comparison of different real-time schedulers on multicore systems
Journal of Systems and Software
Reducing last level cache pollution in NUMA multicore systems for improving cache performance
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Efficient techniques for predicting cache sharing and throughput
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
A Machine Learning Based Meta-Scheduler for Multi-Core Processors
International Journal of Adaptive, Resilient and Autonomic Systems
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Proceedings of the 40th Annual International Symposium on Computer Architecture
Studying multicore processor scaling via reuse distance analysis
Proceedings of the 40th Annual International Symposium on Computer Architecture
Dynamic cache management in multi-core architectures through run-time adaptation
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and modeling of PIDX parallel I/O for performance optimization
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling fair pricing on HPC systems with node sharing
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An empirical model for predicting cross-core performance interference on multicore processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Black box scheduling for resource intensive virtual machine workloads with interference models
Future Generation Computer Systems
ACM Transactions on Architecture and Code Optimization (TACO)
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems
Journal of Parallel and Distributed Computing
Hi-index | 0.04 |
This paper studies the impact of L2 cache sharing on threads that simultaneously share the cache, on a Chip Multi-Processor (CMP) architecture. Cache sharing impacts threads non-uniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose three performance models that predict the impact of cache sharing on co-scheduled threads. The input to our models is the isolated L2 cache stack distance or circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the models is the number of extra L2 cache misses for each thread due to cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation that implements a dual-core CMP architecture, on fourteen pairs of mostly SPEC benchmarks. The most accurate model, the Inductive Probability model, achieves an average error of only 3.9%. Finally, to demonstrate the usefulness and practicality of the model, a case study that details the relationship between an application's temporal reuse behavior and its cache sharingimpact is presented.