Dynamic performance tuning for speculative threads

Authors:
Yangchun Luo;Venkatesan Packirisamy;Wei-Chung Hsu;Antonia Zhai;Nikhil Mungre;Ankit Tarkas
Affiliations:
University of Minnesota - Twin Cities, Minneapolis, MN, USA;University of Minnesota - Twin Cities, Minneapolis, MN, USA;University of Minnesota - Twin Cities, Minneapolis, MN, USA;University of Minnesota - Twin Cities, Minneapolis, MN, USA;University of Minnesota - Twin Cities, Minneapolis, MN, USA;University of Minnesota - Twin Cities, Minneapolis, MN, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 45
Cited 9

The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture

IEEE Transactions on Computers
An architecture for mostly functional languages

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Transactional lock-free execution of lock-based programs

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Design and evaluation of compiler algorithms for pre-execution

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
An infrastructure for adaptive dynamic optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Control Speculation in Multithreaded Processors through Dynamic Loop Detection

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Dynamic native optimization of interpreters

Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Min-cut program decomposition for thread-level speculation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Programming with transactional coherence and consistency (TCC)

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Thread-Level Speculation on a CMP can be energy efficient

Proceedings of the 19th annual international conference on Supercomputing
An Event-Driven Multithreaded Dynamic Optimization Framework

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Online Phase Detection Algorithms

Proceedings of the International Symposium on Code Generation and Optimization
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A performance counter architecture for computing accurate CPI components

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Speculative thread decomposition through empirical optimization

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Detecting phases in parallel applications on shared memory architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

TAO: two-level atomicity for dynamic binary optimizations

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grain speculative parallelism

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Dynamic resource tuning for flexible core chip multiprocessors

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
LAR-CC: Large atomic regions with conditional commits

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Runtime automatic speculative parallelization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient execution of speculative threads and transactions with hardware transactional memory

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent threads to execute speculatively in parallel. While TLS offers significant performance potential for applications that are otherwise non-parallel, extracting efficient speculative threads in the presence of complex control flow and ambiguous data dependences is a real challenge. This task is further complicated by the fact that the performance of speculative threads is often architecture-dependent, input-sensitive, and exhibits phase behaviors. Thus we propose dynamic performance tuning mechanisms that determine where and how to create speculative threads at runtime. This paper describes the design, implementation, and evaluation of hardware and software support that takes advantage of runtime performance profiles to extract efficient speculative threads. In our proposed framework, speculative threads are monitored by hardware-based performance counters and their performance impact is estimated. The creation of speculative threads is adjusted based on the estimation. This paper proposes speculative threads performance estimation techniques, that are capable of correctly determining whether speculation can improve performance for loops that corresponds to 83.8% of total loop execution time across all benchmarks. This paper also examines several dynamic performance tuning policies and finds that the best tuning policy achieves an overall speedup of 36.8%on a set of benchmarks from SPEC2000 suite, which outperforms static thread management by 9.5%.