Speculative parallelization of partial reduction variables

Authors:
Liang Han;Wei Liu;James M. Tuck
Affiliations:
North Carolina State University, Raleigh, NC, USA;Intel Corp., Santa Clara, CA, USA;North Carolina State University, Raleigh, NC, USA
Venue:
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Year:
2010

Citing 33
Cited 2

Automatic recognition of induction variables and recurrence relations by abstract interpretation

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Program optimization and parallelization using idioms

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Scheduling reductions

ICS '94 Proceedings of the 8th international conference on Supercomputing
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
Deriving efficient parallel programs for complex recurrences

PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Compiler Techniques for the Superthreaded Architectures

International Journal of Parallel Programming
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

The Journal of Supercomputing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
A general compiler framework for speculative multithreading

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Recognizing and Parallelizing Bounded Recurrences

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Compiler support for speculative multithreading architecture with probabilistic points-to analysis

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Implicit parallelism with ordered transactions

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Understanding bloom filter intersection for lazy address-set disambiguation

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Scan detection and parallelization in "inherently sequential" nested loop programs

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reduction variables are an important class of cross-thread dependence that can be parallelized by exploiting the associativity and commutativity of their operation. In this paper, we define a class of shared variables called partial reduction variables (PRV). These variables either cannot be proven to be reductions or they violate the requirements of a reduction variable in some way. We describe an algorithm that allows the compiler to detect PRVs, and we also discuss the necessary requirements to parallelize detected PRVs. Based on these requirements, we propose an implementation in a TLS system to parallelize PRVs that works by a combination of techniques at compile time and in the hardware. The compiler transforms the variable under the assumption that the reduction-like behavior proven statically will hold true at runtime. However, if a thread reads or updates the shared variable as a result of an alias or unlikely control path, a lightweight hardware mechanism will detect the access and synchronize it to ensure correct execution. We implement our compiler analysis and transformation in GCC, and analyze its potential on the SPEC CPU 2000 benchmarks.We find that supporting PRVs provides up to 46% performance gain over a highly optimized TLS system and on average 10.7% performance improvement.