Optimal task assignment in multithreaded processors: a statistical approach

Authors:
Petar Radojković;Vladimir Čakarević;Miquel Moretó;Javier Verdú;Alex Pajuelo;Francisco J. Cazorla;Mario Nemirovsky;Mateo Valero
Affiliations:
Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Universitat Politecnica de Catalunya & Barcelona Supercomputing Center, Barcelona, Spain;Universitat Politecnica de Catalunya, Barcelona, Spain;Universitat Politecnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center & Spanish National Research Council (IIIA-CSIC), Barcelona, Spain;ICREA Research Professor at Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center & Universitat Politecnica de Catalunya, Barcelona, Spain
Venue:
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Year:
2012

Citing 28
Cited 4

Parameter and quantile estimation for the generalized pareto distribution

Technometrics
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Law of Internet Security and Privacy

Law of Internet Security and Privacy
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
A case for run-time adaptation in packet processing systems

ACM SIGCOMM Computer Communication Review
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Design considerations for network processor operating systems

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
An Application of Extreme Value Theory for Measuring Financial Risk

Computational Economics
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l

Proceedings of the 22nd annual international conference on Supercomputing
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
HASS: a scheduler for heterogeneous multicore systems

ACM SIGOPS Operating Systems Review
Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread to Core Assignment in SMT On-Chip Multiprocessors

SBAC-PAD '09 Proceedings of the 2009 21st International Symposium on Computer Architecture and High Performance Computing
Thread to strand binding of parallel network applications in massive multi-threaded systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Runtime resource allocation for software pipelines

Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
Optimizations for configuring and mapping software pipelines in many core systems

Proceedings of the 50th Annual Design Automation Conference
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system performance. However, finding an optimal task assignment for a workload running on MMT processors is an NP-complete problem. Due to the fact that the performance of the best possible task assignment is unknown, the room for improvement of current task-assignment algorithms cannot be determined. This is a major problem for the industry because it could lead to: (1)~A waste of resources if excessive effort is devoted to improving a task assignment algorithm that already provides a performance that is close to the optimal one, or (2)~significant performance loss if insufficient effort is devoted to improving poorly-performing task assignment algorithms. In this paper, we present a method based on Extreme Value Theory that allows the prediction of the performance of the optimal task assignment in MMT processors. We further show that executing a sample of several hundred or several thousand random task assignments is enough to obtain, with very high confidence, an assignment with a performance that is close to the optimal one. We validate our method with an industrial case study for a set of multithreaded network applications running on an UltraSPARC~T2 processor.