Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Measuring limits of parallelism and characterizing its vulnerability to resource constraints
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The limits of instruction level parallelism in SPEC95 applications
ACM SIGARCH Computer Architecture News - Special issue on Interact-3 workshop
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cost effective dynamic program slicing
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cost and precision tradeoffs of dynamic data slicing algorithms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Enabling tracing Of long-running multithreaded programs via dynamic execution reduction
Proceedings of the 2007 international symposium on Software testing and analysis
Unified control flow and data dependence traces
ACM Transactions on Architecture and Code Optimization (TACO)
Measuring the Parallelism Available for Very Long Instruction Word Architectures
IEEE Transactions on Computers
Revisiting the Sequential Programming Model for Multi-Core
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)
Languages and Compilers for Parallel Computing
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Profiling Java programs for parallelism
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Exploiting Parallelism with Dependence-Aware Scheduling
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Understanding parallelism-inhibiting dependences in sequential Java programs
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Limits of parallelism using dynamic dependency graphs
WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Vector seeker: a tool for finding vector potential
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical for achieving high performance on emerging and future architectures. A vast majority of existing applications were developed without any attention by their developers towards effective vectorizability of the codes. While developers of production compilers such as GNU gcc, Intel icc, PGI pgcc, and IBM xlc have invested considerable effort and made significant advances in enhancing automatic vectorization capabilities, these compilers still cannot effectively vectorize many existing scientific and engineering codes. It is therefore of considerable interest to analyze existing applications to assess the inherent latent potential for SIMD parallelism, exploitable through further compiler advances and/or via manual code changes. In this paper we develop an approach to infer a program's SIMD parallelization potential by analyzing the dynamic data-dependence graph derived from a sequential execution trace. By considering only the observed run-time data dependences for the trace, and by relaxing the execution order of operations to allow any dependence-preserving reordering, we can detect potential SIMD parallelism that may otherwise be missed by more conservative compile-time analyses. We show that for several benchmarks our tool discovers regions of code within computationally-intensive loops that exhibit high potential for SIMD parallelism but are not vectorized by state-of-the-art compilers. We present several case studies of the use of the tool, both in identifying opportunities to enhance the transformation capabilities of vectorizing compilers, as well as in pointing to code regions to manually modify in order to enable auto-vectorization and performance improvement by existing compilers.