Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Authors:
Antonia Zhai;Christopher B. Colohan;J. Gregory Steffan;Todd C. Mowry
Affiliations:
-;-;-;-
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2004

Citing 25
Cited 18

Efficient context-sensitive pointer analysis for C programs

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Redundant Synchronization Elimination for DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic points-to sets: a comparison with static analyses and potential applications in program understanding and optimization

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Compiler support for speculative multithreading architecture with probabilistic points-to analysis

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A fast approximate interprocedural analysis for speculative multithreading compilers

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Tracking pointers with path and context sensitivity for bug detection in C programs

Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering

Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A compiler cost model for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO)
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Modeling optimistic concurrency using quantitative dependence analysis

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic performance tuning for speculative threads

Proceedings of the 36th annual international symposium on Computer architecture
Exploiting speculative thread-level parallelism in data compression applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A method of computation decomposition on tightly-nested loop automatic parallelization

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Loop selection for thread-level speculation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
HiRe: using hint & release to improve synchronization of speculative threads

Proceedings of the 26th ACM international conference on Supercomputing
Dynamically dispatching speculative threads to improve sequential execution

ACM Transactions on Architecture and Code Optimization (TACO)
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

ACM Transactions on Architecture and Code Optimization (TACO)
A thread partitioning approach for speculative multithreading

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication using hardware support have been proposed, there is relatively little work onexploiting the potential of compiler optimization.Building on recent research on compiler optimization of scalar value communication between speculative threads, we propose compiler techniques for the optimization of memory-resident values.In TLS, data dependences through memory-resident values aretracked by the underlying hardware and preserved by reexecutingany speculative thread that violates a dependence; however, reexecution incurs a large performance penalty and should be usedonly to resolve data dependences that are infrequent. In contrast,value communication for frequently-occurring data dependencesmust be very efficient.In this paper, we propose using the compiler to first identifyfrequently-occurring memory-resident data dependences, then insert synchronization for communicating values to preserve thesedependences. We find that by synchronizing frequently-occurringdata dependences we can significantly improve the efficiency ofparallel execution. A comparison between compiler-inserted andhardware-inserted memory synchronization reveals that the two techniques are complementary, with each technique benefitting different benchmarks.