Dynamic trace-based analysis of vectorization potential of applications

Authors:
Justin Holewinski;Ragavendar Ramamurthi;Mahesh Ravishankar;Naznin Fauzia;Louis-Noël Pouchet;Atanas Rountev;P. Sadayappan
Affiliations:
Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Year:
2012

Citing 34
Cited 3

Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications

IEEE Transactions on Computers
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the limits of program parallelism and its smoothability

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Measuring limits of parallelism and characterizing its vulnerability to resource constraints

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The limits of instruction level parallelism in SPEC95 applications

ACM SIGARCH Computer Architecture News - Special issue on Interact-3 workshop
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cost effective dynamic program slicing

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cost and precision tradeoffs of dynamic data slicing algorithms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Whole execution traces and their applications

ACM Transactions on Architecture and Code Optimization (TACO)
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Enabling tracing Of long-running multithreaded programs via dynamic execution reduction

Proceedings of the 2007 international symposium on Software testing and analysis
Unified control flow and data dependence traces

ACM Transactions on Architecture and Code Optimization (TACO)
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers
Revisiting the Sequential Programming Model for Multi-Core

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)

Languages and Compilers for Parallel Computing
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Profiling Java programs for parallelism

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Exploiting Parallelism with Dependence-Aware Scheduling

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Speculative parallelization of sequential loops on multicores

International Journal of Parallel Programming
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Understanding parallelism-inhibiting dependences in sequential Java programs

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Limits of parallelism using dynamic dependency graphs

WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)
Vector seeker: a tool for finding vector potential

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical for achieving high performance on emerging and future architectures. A vast majority of existing applications were developed without any attention by their developers towards effective vectorizability of the codes. While developers of production compilers such as GNU gcc, Intel icc, PGI pgcc, and IBM xlc have invested considerable effort and made significant advances in enhancing automatic vectorization capabilities, these compilers still cannot effectively vectorize many existing scientific and engineering codes. It is therefore of considerable interest to analyze existing applications to assess the inherent latent potential for SIMD parallelism, exploitable through further compiler advances and/or via manual code changes. In this paper we develop an approach to infer a program's SIMD parallelization potential by analyzing the dynamic data-dependence graph derived from a sequential execution trace. By considering only the observed run-time data dependences for the trace, and by relaxing the execution order of operations to allow any dependence-preserving reordering, we can detect potential SIMD parallelism that may otherwise be missed by more conservative compile-time analyses. We show that for several benchmarks our tool discovers regions of code within computationally-intensive loops that exhibit high potential for SIMD parallelism but are not vectorized by state-of-the-art compilers. We present several case studies of the use of the tool, both in identifying opportunities to enhance the transformation capabilities of vectorizing compilers, as well as in pointing to code regions to manually modify in order to enable auto-vectorization and performance improvement by existing compilers.