How many threads to spawn during program multithreading?

Authors:
Alexandru Nicolau;Arun Kejariwal
Affiliations:
University of California, Irvine, Irvine, CA;Yahoo! Inc., Sunnyvale, CA
Venue:
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Year:
2010

Citing 30
Cited 0

URPR—An extension of URCR for software pipelining

MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The performance implications of thread management alternatives for shared-memory multiprocessors

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Introduction to algorithms

Introduction to algorithms
An efficient algorithm for a task allocation problem

Journal of the ACM (JACM)
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
VLIW compilation techniques in a superscalar environment

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The definition of dependence distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Performance counters and state sharing annotations: a unified approach to thread locality

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Dependence Analysis

Dependence Analysis
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Structure of Computers and Computations

Structure of Computers and Computations
Making Compaction-Based Parallelization Affordable

IEEE Transactions on Parallel and Distributed Systems
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Task allocation in distributed systems: A survey of practical strategies

ACM '82 Proceedings of the ACM '82 conference
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)

Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction)

Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction)
Compaction-based parallelization

Compaction-based parallelization
On the Mapping Problem

IEEE Transactions on Computers
Dual Processor Scheduling with Dynamic Reassignment

IEEE Transactions on Software Engineering
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
On the evaluation and extraction of thread-level parallelism in ordinary programs

On the evaluation and extraction of thread-level parallelism in ordinary programs
Intel threading building blocks

Intel threading building blocks
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
Techniques for efficient placement of synchronization primitives

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thread-level program parallelization is key for exploiting the hardware parallelism of the emerging multi-core systems. Several techniques have been proposed for program multithreading. However, the existing techniques do not address the following key issues associated with multithread execution of a given program: (a) Whether multithreaded execution is faster than sequential execution; (b) How many threads to spawn during program multithreading. In this paper, we address the above limitations. Specifically, we propose a novel approach - T-OPT- to determine how many threads to spawn during multithreaded execution of a given program region. The latter helps to check under-subscribing and oversubscribing of the hardware resources. This in turn facilitates exploitation on higher level of thread-level parallelism (TLP) than what can be achieved using the state-of-the-art. We show that, from program dependence standpoint, use of larger number of threads than advocated by the proposed approach does not yield higher degree of TLP. We present a couple of case studies and results using kernels, extracted from open source codes, to demonstrate the efficacy of our techniques on a real machine.