Using thread-level speculation to simplify manual parallelization

Authors:
Manohar K. Prabhu;Kunle Olukotun
Affiliations:
Stanford University, Stanford, California;Stanford University, Stanford, California
Venue:
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2003

Citing 19
Cited 29

An integrated compilation and performance analysis environment for data parallel programs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Proceedings of the 25th annual international symposium on Computer architecture
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
SUIF Explorer: an interactive and interprocedural parallelizer

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Silent Stores and Store Value Locality

IEEE Transactions on Computers
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Paradyn Parallel Performance Measurement Tool

Computer
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Min-cut program decomposition for thread-level speculation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Programming with transactional coherence and consistency (TCC)

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An Application Analysis Framework For Polymorphic Chip Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Optimistic intra-transaction parallelism on chip multiprocessors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Chip multi-processor scalability for single-threaded applications

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Dynamic parallelization and mapping of binary executables on hierarchical platforms

Proceedings of the 3rd conference on Computing frontiers
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture
Executing Java programs with transactional memory

Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Implicit parallelism with ordered transactions

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid multi-core architecture for boosting single-threaded performance

ACM SIGARCH Computer Architecture News
A compiler cost model for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO)
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
Split hardware transactions: true nesting of transactions using best-effort hardware transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Spice: speculative parallel iteration chunk execution

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
A type and effect system for deterministic parallel Java

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
81.6 GOPS object recognition processor based on a memory-centric NoC

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting speculative thread-level parallelism in data compression applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Parallel programming must be deterministic by default

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
A pattern language for parallelizing irregular algorithms

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Hardware-enabled dynamic resource allocation for manycore systems using bidding-based system feedback

EURASIP Journal on Embedded Systems
Alias control for deterministic parallelism

Aliasing in Object-Oriented Programming
A thread partitioning approach for speculative multithreading

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we provide examples of how thread-level speculation(TLS) simplifies manual parallelization and enhances itsperformance. A number of techniques for manual parallelizationusing TLS are presented and results are provided that indicate theperformance contribution of each technique on seven SPEC CPU2000benchmark applications. We also provide indications of theprogramming effort required to parallelize each benchmark. TLSparallelization yielded a 110% speedup on our four floating pointapplications and a 70% speedup on our three integer applications,while requiring only approximately 80 programmer hours and 150lines of non-template code per application. These results supportthe idea that manual parallelization using TLS is an efficient wayto extract fine-grain thread-level parallelism.