Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Managing energy and server resources in hosting centers
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Framework for Providing Quality of Service in Chip Multi-Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
PowerNap: eliminating server idle power
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
FlexDCP: a QoS framework for CMP architectures
ACM SIGOPS Operating Systems Review
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Contention aware execution: online contention detection and response
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Towards characterizing cloud backend workloads: insights from Google compute clusters
ACM SIGMETRICS Performance Evaluation Review
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
Power management of online data-intensive services
Proceedings of the 38th annual international symposium on Computer architecture
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DejaVu: accelerating resource allocation in virtualized environments
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Automated locality optimization based on the reuse distance of string operations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
D-factor: a quantitative model of application slow-down in multi-resource shared systems
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Compiling for niceness: mitigating contention for QoS in warehouse scale computers
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Paragon: QoS-aware scheduling for heterogeneous datacenters
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Defensive loop tiling for shared cache
CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Whare-map: heterogeneity in "homogeneous" warehouse-scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Quasar: resource-efficient and QoS-aware cluster management
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Ubik: efficient cache sharing with strict qos for latency-critical workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
QoS-Aware scheduling in heterogeneous datacenters with paragon
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
Ensuring the quality of service (QoS) for latency-sensitive applications while allowing co-locations of multiple applications on servers is critical for improving server utilization and reducing cost in modern warehouse-scale computers (WSCs). Recent work relies on static profiling to precisely predict the QoS degradation that results from performance interference among co-running applications to increase the number of "safe" co-locations. However, these static profiling techniques have several critical limitations: 1) a priori knowledge of all workloads is required for profiling, 2) it is difficult for the prediction to capture or adapt to phase or load changes of applications, and 3) the prediction technique is limited to only two co-running applications. To address all of these limitations, we present Bubble-Flux, an integrated dynamic interference measurement and online QoS management mechanism to provide accurate QoS control and maximize server utilization. Bubble-Flux uses a Dynamic Bubble to probe servers in real time to measure the instantaneous pressure on the shared hardware resources and precisely predict how the QoS of a latency-sensitive job will be affected by potential co-runners. Once "safe" batch jobs are selected and mapped to a server, Bubble-Flux uses an Online Flux Engine to continuously monitor the QoS of the latency-sensitive application and control the execution of batch jobs to adapt to dynamic input, phase, and load changes to deliver satisfactory QoS. Batch applications remain in a state of flux throughout execution. Our results show that the utilization improvement achieved by Bubble-Flux is up to 2.2x better than the prior static approach.