Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Major chip manufacturers have all introduced multi-core microprocessors. Multi-socket systems built from these processors are routinely used for running various server applications. Typically each processor in such a system shares a cache at either the L2 or L3 level. Depending on the application that is run on the system, inter-socket cache-to-cache transfers can impact overall performance. This paper presents a new operating system (OS) scheduling optimization to reduce the impact of such inter-socket cache-to-cache transfers. By observing the pattern of cache-to-cache transfers between every pair of threads for each scheduling quantum and applying four different algorithms, we come up with a new schedule of threads for the next quantum. This new schedule potentially cuts down the inter-socket cache-to-cache transfers for the next scheduling quantum. We studied the impact of these algorithms on 18 real-world benchmarks. For the benchmarks we studied, inter-socket cache-to-cache transfers were cut down by as much as 99.3% on some benchmarks and, on average, between -5.5% and 24% depending on the scheduling algorithm employed.