Balancing thread partition for efficiently exploiting speculative thread-level parallelism

Authors:
Yaobin Wang;Hong An;Bo Liang;Li Wang;Ming Cong;Yongqing Ren
Affiliations:
Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China and Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences, Beij ...;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China and Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences, Beij ...;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China and Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences, Beij ...;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China and Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences, Beij ...;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China and Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences, Beij ...;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China and Key Laboratory of Computer System and Architecture, Chinese Academy of Sciences, Beij ...
Venue:
APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Year:
2007

Citing 9
Cited 0

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
SUIF Explorer: an interactive and interprocedural parallelizer

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Paradyn Parallel Performance Measurement Tool

Computer
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

General-purpose computing is taking an irreversible step toward on-chip parallel architectures. One way to enhance the performance of chip multiprocessors is the use of thread-level speculation (TLS). Identifying the points where the speculative threads will be spawned becomes one of the critical issues of this kind of architectures. In this paper, a criterion for selecting the region to be speculatively executed is presented to identify potential sources of speculative parallelism in general-purpose programs. A dynamic profiling method has been provided to search a large space of TLS parallelization schemes and where parallelism was located within the application. We analyze key factors impacting speculative thread-level parallelism of SPEC CPU2000, evaluate whether a given application or parts of it are suitable for TLS technology, and study how to balance thread partition for efficiently exploiting speculative thread-level parallelism. It shows that the inter-thread data dependences are ubiquitous and the synchronization mechanism is necessary; Return value prediction and loop unrolling are important to improve performance. The information we got can be used to guide the thread partition of TLS.