Architectural Support for Enhanced SMT Job Scheduling

Authors:
Alex Settle;Joshua Kihm;Andrew Janiszewski;Dan Connors
Affiliations:
University of Colorado at Boulder;University of Colorado at Boulder;University of Colorado at Boulder;University of Colorado at Boulder
Venue:
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Year:
2004

Citing 17
Cited 15

IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Implementation of Fine-Grained Cache Monitoring for Improved SMT Scheduling

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
A co-phase matrix to guide simultaneous multithreading simulation

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software

Thread coloring: a scheduler proposal from user to hardware threads

ACM SIGOPS Operating Systems Review
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
A dynamically reconfigurable cache for multithreaded processors

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Exploration of the Influence of Program Inputs on CMP Co-scheduling

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Proceedings of the 6th ACM conference on Computing frontiers
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimal task assignment in multithreaded processors: a statistical approach

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

By converting thread-level parallelism to instruction level parallelism, Simultaneous Multithreaded (SMT) processors are emerging as effective ways to utilize the resources of modern superscalar architectures. However, the full potential of SMT has not yet been reached as most modern operating systems use existing single-thread or multiprocessor algorithms to schedule threads, neglecting contention for resources between threads. To date, even the best SMT scheduling algorithms simply try to group threads for co-residency based on each thread's expected resource utilization but do not take into account variance in thread behavior. As such, we introduce architectural support that enables new thread scheduling algorithms to group threads for co-residency based on fine-grain memory system activity information. The proposed memory monitoring framework centers on the concept of a cache activity vector, which exposes runtime cache resource information to the operating system to improve job scheduling. Using this scheduling technique, we experimentally evaluate the overall performance improvement of workloads on an SMT machine compared against the most recent Linux job scheduler. This work is first motivated with experiments in a simulated environment, then validated on a Hyperthreading-enabled Intel Pentium-4 Xeon microprocessor running a modified version of the latest Linux Kernel.