Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Performance prediction based on inherent program similarity
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Storage performance virtualization via throughput and latency control
ACM Transactions on Storage (TOS)
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Power provisioning for a warehouse-sized computer
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Power-aware dynamic placement of HPC applications
Proceedings of the 22nd annual international conference on Supercomputing
Enforcing performance isolation across virtual machines in Xen
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Towards practical page coloring-based multicore cache management
Proceedings of the 4th ACM European conference on Computer systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
VM3: Measuring, modeling and managing VM shared resources
Computer Networks: The International Journal of Computer and Telecommunications Networking
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
Runtime measurements in the cloud: observing, analyzing, and reducing variance
Proceedings of the VLDB Endowment
EC2 performance analysis for resource provisioning of service-oriented applications
ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
D-factor: a quantitative model of application slow-down in multi-resource shared systems
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Achieving application-centric performance targets via consolidation on multicores: myth or reality?
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Modeling and performance analysis of large scale IaaS Clouds
Future Generation Computer Systems
Future Generation Computer Systems
CPI2: CPU performance isolation for shared compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
An experimental study of cascading performance interference in a virtualized environment
ACM SIGMETRICS Performance Evaluation Review
Resource efficient computing for warehouse-scale datacenters
Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and modeling of PIDX parallel I/O for performance optimization
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling fair pricing on HPC systems with node sharing
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards a performance-as-a-service cloud
Proceedings of the 4th annual Symposium on Cloud Computing
QoS-Aware scheduling in heterogeneous datacenters with paragon
ACM Transactions on Computer Systems (TOCS)
Group-based memory oversubscription for virtualized clouds
Journal of Parallel and Distributed Computing
Virtual machine consolidation based on interference modeling
The Journal of Supercomputing
Virtual Machine Coscheduling: A Game Theoretic Approach
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Workload consolidation is very attractive for cloud platforms due to several reasons including reduced infrastructure costs, lower energy consumption, and ease of management. Advances in virtualization hardware and software continue to improve resource isolation among consolidated workloads but a particular form of resource interference is yet to see a commercially widely adopted solution - the interference due to shared processor caches. Existing solutions for handling cache interference require new hardware features, extensive software changes, or reduce the achieved overall throughput. A crucial requirement for effective consolidation is to be able to predict the impact of cache interference among consolidated workloads. In this paper, we present a practical technique for predicting performance interference due to shared processor cache which works on current processor architectures and requires minimal software changes. While performance degradation can be empirically measured for a given placement of consolidated workloads, the number of possible placements grows exponentially with the number of workloads and actual measurement of degradation is thus not practical for every possible placement. Our technique predicts the degradation for any possible placement using only a linear number of measurements, and can be used to select the most efficient consolidation pattern, for required performance and resource constraints. An average prediction error of less than 4% is achieved across a wide variety of benchmark workloads, using Xen VMM on Intel Core 2 Duo and Nehalem quad-core processor platforms. We also illustrate the usefulness of our prediction technique in realizing better workload placement decisions for given performance and resource cost objectives.