The effect of page allocation on caches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Performance counters and state sharing annotations: a unified approach to thread locality
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Thread coloring: a scheduler proposal from user to hardware threads
ACM SIGOPS Operating Systems Review
Hyper-threading aware process scheduling heuristics
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Do commodity SMT processors need more OS research?
ACM SIGOPS Operating Systems Review
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
Proceedings of the 6th ACM conference on Computing frontiers
Resource-conscious scheduling for energy efficiency on multicore processors
Proceedings of the 5th European conference on Computer systems
ScaleUPC: a UPC compiler for multi-core systems
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
LOMARC — lookahead matchmaking for multi-resource coscheduling
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Improving execution unit occupancy on SMT-based processors through hardware-aware thread scheduling
Future Generation Computer Systems
Hi-index | 0.00 |
Hyper-Threading Technology (HT) is Intel®'s implementation of Simultaneous Multi-Threading (SMT). HT delivers two logical processors that can execute different tasks simultaneously using shared hardware resources on the processor package. In general a multi-threaded application that scales well and is optimized on SMP systems should be able to benefit on systems with HT as well for most cases, without any changes, although the operating system (OS) needs HT-specific enhancements. Among those we found process scheduling is one of the most crucial enhancements required in the OS, and we have been seeking the optimal scheduling for HT, evaluating various ideas and algorithms. One of our finds is, to efficiently utilize such execution resources, severe resource contention against the same and limited execution resource should be avoided in a processor package. The OS can attempt to utilize the CPU execution resources in processor packages if it can monitor and predict how the processes and system utilize the CPU execution resources in the multiprocessor environment. We have implemented a supplementary functionality for the scheduler called "Micro-Architectural Scheduling Assist (MASA)" on Linux (2.4.18) chiefly as a user program, rather than in the scheduler itself. This is because we believe that the users can tune the system more effectively for various workloads if we clarify how it works as a distinct entity. Most of this problem and solution is generic; all we require is for the OS to support an API for processor affinity and to provide per-thread or per-process hardware event counts from the hardware performance monitoring counters at runtime.