Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Observed structure of addresses in IP traffic
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
A pipelined memory architecture for high throughput network processors
Proceedings of the 30th annual international symposium on Computer architecture
A case for run-time adaptation in packet processing systems
ACM SIGCOMM Computer Communication Review
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Understanding The Linux Kernel
Understanding The Linux Kernel
Design considerations for network processor operating systems
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
A comparison of interactivity in the Linux 2.6 scheduler and an MLFQ scheduler
Software—Practice & Experience
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
IBM Journal of Research and Development
Measuring Operating System Overhead on CMT Processors
SBAC-PAD '08 Proceedings of the 2008 20th International Symposium on Computer Architecture and High Performance Computing
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Characterizing the resource-sharing levels in the UltraSPARC T2 processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread to Core Assignment in SMT On-Chip Multiprocessors
SBAC-PAD '09 Proceedings of the 2009 21st International Symposium on Computer Architecture and High Performance Computing
Task partitioning for multi-core network processors
CC'05 Proceedings of the 14th international conference on Compiler Construction
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Automatic generation of program affinity policies using machine learning
CC'13 Proceedings of the 22nd international conference on Compiler Construction
Hi-index | 0.00 |
In processors with several levels of hardware resource sharing,like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a single level of resource sharing, such as pure-SMT or pure-CMP processors. Once the operating system selects the set of applications to simultaneously schedule on the processor (workload), each application/thread must be assigned to one of the hardware contexts(strands). We call this last scheduling step the Thread to Strand Binding or TSB. In this paper, we show that the TSB impact on the performance of processors with several levels of shared resources is high. We measure a variation of up to 59% between different TSBs of real multithreaded network applications running on the UltraSPARC T2 processor which has three levels of resource sharing. In our view, this problem is going to be more acute in future multithreaded architectures comprising more cores, more contexts per core, and more levels of resource sharing. We propose a resource-sharing aware TSB algorithm (TSBSched) that significantly facilitates the problem of thread to strand binding for software-pipelined applications, representative of multithreaded network applications. Our systematic approach encapsulates both, the characteristics of multithreaded processors under the study and the structure of the software pipelined applications. Once calibrated for a given processor architecture, our proposal does not require hardware knowledge on the side of the programmer, nor extensive profiling of the application. We validate our algorithm on the UltraSPARC T2 processor running a set of real multithreaded network applications on which we report improvements of up to 46% compared to the current state-of-the-art dynamic schedulers.