Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
On the equal-subset-sum problem
Information Processing Letters
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
Journal of Parallel and Distributed Computing
Exact and Approximate Algorithms for Scheduling Nonidentical Processors
Journal of the ACM (JACM)
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Speeding Up Kernel Scheduler by Reducing Cache Misses
Proceedings of the FREENIX Track: 2002 USENIX Annual Technical Conference
Effects of clock resolution on the scheduling of interactive and soft real-time processes
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Resource management in a decentralized system
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Extensive use of multi-threaded applications that run on SMP mac hines, justifies modifications in thread scheduling algorithms to consider threads' characteristics in order to improve performance. Current schedulers (e.g. in Linux, AIX) avoid migrating tasks between CPUs unless absolutely necessary. Unwarranted data cache misses occur when tasks that share data run on different CPUs, or are far apart time-wise on the same CPU. This work presents an extension to the Linux scheduler that exploits inter-task data relat ions to reduce data cache misses in multi-threaded applications running on SMP platforms, thus improving runtime, memory throughput, and energy consumpt ion. Our approach schedules the tasks to the CPU that holds the relevant data rather than to the one with highest affinity. We observed improve ments in CPU time and throughput on several benchmarks. For the Chat benchmark, the improvement in CPU time and cache misses is over 30% on average.