Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Proceedings of the 30th annual international symposium on Computer architecture
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
An empirical model for predicting cross-core performance interference on multicore processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Competition for shared memory resources on multiprocessors is the most dominant cause for slowing down applications and makes their performance varies unpredictably. It exacerbates the need for Quality of Service (QoS) on such systems. In this paper, we propose a fair-progress process scheduling (FPS) policy to improve system fairness. Its strategy is to force the equally-weighted applications to have the same amount of slowdown when they run concurrently. The basic approach is to monitor the progress of all applications at runtime. When we find an application suffered more slowdown and accumulated less effective work than others, we allocate more CPU time to give it a better parity. Our policy also allows different weights to different threads, and provides an effective and robust tuner that allows the OS to freely make tradeoffs between system fairness and higher throughput. Evaluation results show that FPS can significantly improve system fairness by an average of 53.5% and 65.0% on a 4-core processor with a private cache and a 4-core processor with a shared cache, respectively. The penalty is about 1.1% and 1.6% of the system throughput. For memory-intensive workloads, FPS also improves system fairness by an average of 45.2% and 21.1% on 4-core and 8-core system respectively at the expense of a throughput loss of about 2%.