Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization

Authors:
Alain Ketterlin;Philippe Clauss
Affiliations:
-;-
Venue:
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2012

Citing 24
Cited 1

Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Identifying loops using DJ graphs

ACM Transactions on Programming Languages and Systems (TOPLAS)
SUIF Explorer: an interactive and interprocedural parallelizer

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Efficient program tracing

Computer
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
On the Optimality of Allen and Kennedy's Algorithm for Parallel Extraction in Nested Loops

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Modeling optimistic concurrency using quantitative dependence analysis

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Visualizing potential parallelism in sequential programs

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Towards automatic program partitioning

Proceedings of the 6th ACM conference on Computing frontiers
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Profiling Java programs for parallelism

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Efficient memory tracing by program skeletonization

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Multi-slicing: a compiler-supported parallel approach to data dependence profiling

Proceedings of the 2012 International Symposium on Software Testing and Analysis

Online dynamic dependence analysis for speculative polyhedral parallelization

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a tool using one or more executions of a sequential program to detect parallel portions of the program. The tool, called Par wiz, uses dynamic binary instrumentation, targets various forms of parallelism, and suggests distinct parallelization actions, ranging from simple directive tagging to elaborate loop transformations. The first part of the paper details the link between the program's static structures (like routines and loops), the memory accesses performed by the program, and the dependencies that are used to highlight potential parallelism. This part also describes the instrumentation involved, and the general architecture of the system. The second part of the paper puts the framework into action. The first study focuses on loop parallelism, targeting OpenMP parallel-for directives, including privatization when necessary. The second study is an adaptation of a well-known vectorization technique based on a slightly richer dependence description, where the tool suggests an elaborate loop transformation. The third study views loops as a graph of (hopefully lightly) dependent iterations. The third part of the paper explains how the overall cost of data-dependence profiling can be reduced. This cost has two major causes: first, instrumenting memory accesses slows down the program, and second, turning memory accesses into dependence graphs consumes processing time. Par wiz uses static analysis of the original (binary) program to provide data at a coarser level, moving from individual accesses to complete loops whenever possible, thereby reducing the impact of both sources of inefficiency.