Compatible phase co-scheduling on a CMP of multi-threaded processors

Authors:
Ali El-Moursy;Rajeev Garg;David H. Albonesi;Sandhya Dwarkadas
Affiliations:
Departments of Electrical and Computer Engineering and of Computer Science, University of Rochester;Departments of Electrical and Computer Engineering and of Computer Science, University of Rochester;Computer Systems Laboratory, Cornell University;Departments of Electrical and Computer Engineering and of Computer Science, University of Rochester
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 15
Cited 16

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Alpha 21264 Microprocessor

IEEE Micro
Hyperthreading Technology in the Netburst Microarchitecture

IEEE Micro
Multithreaded Execution Architecture and Compilation

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Implementation of Fine-Grained Cache Monitoring for Improved SMT Scheduling

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Montecito: A Dual-Core, Dual-Thread Itanium Processor

IEEE Micro
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference

Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Exploration of the Influence of Program Inputs on CMP Co-scheduling

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Sharing-aware OS scheduling algorithms for multi-socket multi-core servers

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Proceedings of the 6th ACM conference on Computing frontiers
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A case study of transport protocols to improve the execution of applications in virtual organisations utilising multicluster network configurations

International Journal of Networking and Virtual Organisations
Program phase detection and exploitation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SCTP, XTP and TCP as transport protocols for high performance computing on multi-cluster grid environments

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Optimal task assignment in multithreaded processors: a statistical approach

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Performance Counters for Energy Efficient Co-Scheduling of Mixed Workloads on Multi-Core Platforms

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The industry is rapidly moving towards the adoption of Chip Multi-Processors (CMPs) of Simultaneous Multi-Threaded (SMT) cores for general purpose systems. The most prominent use of such processors, at least in the near term, will be as job servers running multiple independent threads on the different contexts of the various SMT cores. In such an environment, the co-scheduling of phases from different threads plays a significant role in the overall throughput. Less throughput is achieved when phases from different threads that conflict for particular hardware resources are scheduled together, compared with the situation where compatible phases are co-scheduled on the same SMT core. Achieving the latter requires precise per-phase hardware statistics that the scheduler can use to rapidly identify possible incompatibilities among phases of different threads, thereby avoiding the potentially high performance cost of inter-thread contention. In this paper, we devise phase co-scheduling policies for a dual-core CMP of dual-threaded SMT processors. We explore a number of approaches and find that the use of ready and in-flight instruction metrics permits effective co-scheduling of compatible phases among the four contexts. This approach significantly outperforms the worst static grouping of threads, and very closely matches the best static grouping, even outperforming it by as much as 7%.