Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
Journal of Parallel and Distributed Computing
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Performance Implications of Cache Affinity on Multicore Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Linux Journal
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Coordinate page allocation and thread group for improving main memory power efficiency
Proceedings of the Workshop on Power-Aware Computing and Systems
Hi-index | 0.00 |
Optimizing system performance through scheduling has received a lot of attention. However, none of the existing approaches can balance the system performance improvement and the fair share of CPU time among threads. We present in this paper a share memory aware scheduler (SMAS). The key idea is to adopt thread group scheduling which partitions threads based on memory address space to reduce switching overhead and to give each thread a fair chance to occupy CPU time. There are three main contributions: 1) SMAS does well in balancing system performance and fairness among all threads; 2) to our knowledge, this is the first attempt to use share memory aware scheduler for system performance improvement; 3) we implement SMAS both in testbed and simulator for evaluation. The testbed results on a 2-core processor show that our proposed scheduler can improve performance of different performance parameters with neglected overhead in fairness, which reduced 0.128% in cache miss rate, 2.62% in run time, 13.15% in DTBL misses, 31.68% in ITLB misses and 46.15% in ITLB flushes maximum. Furthermore, our extensive simulation results for 4 and 8 cores demonstrate that SMAS is highly scalable.