Sharing-aware OS scheduling algorithms for multi-socket multi-core servers

Authors:
Murthy Durbhakula
Affiliations:
Advanced Micro Devices (AMD), Inc., Bangalore, India
Venue:
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Year:
2008

Citing 4
Cited 0

Impact of sharing-based thread placement on multithreaded architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Major chip manufacturers have all introduced multi-core microprocessors. Multi-socket systems built from these processors are routinely used for running various server applications. Typically each processor in such a system shares a cache at either the L2 or L3 level. Depending on the application that is run on the system, inter-socket cache-to-cache transfers can impact overall performance. This paper presents a new operating system (OS) scheduling optimization to reduce the impact of such inter-socket cache-to-cache transfers. By observing the pattern of cache-to-cache transfers between every pair of threads for each scheduling quantum and applying four different algorithms, we come up with a new schedule of threads for the next quantum. This new schedule potentially cuts down the inter-socket cache-to-cache transfers for the next scheduling quantum. We studied the impact of these algorithms on 18 real-world benchmarks. For the benchmarks we studied, inter-socket cache-to-cache transfers were cut down by as much as 99.3% on some benchmarks and, on average, between -5.5% and 24% depending on the scheduling algorithm employed.