PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Points-to analysis in almost linear time
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Evaluating the precision of static reference analysis using profiling
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Points-to Analysis by Type Inference of Programs with Structures and Unions
CC '96 Proceedings of the 6th International Conference on Compiler Construction
An infrastructure for adaptive dynamic optimization
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Precise dynamic slicing algorithms
Proceedings of the 25th International Conference on Software Engineering
VPC3: a fast and effective trace-compression algorithm
Proceedings of the joint international conference on Measurement and modeling of computer systems
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic slicing long running programs through execution fast forwarding
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance
Proceedings of the International Symposium on Code Generation and Optimization
Unified control flow and data dependence traces
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient field-sensitive pointer analysis of C
ACM Transactions on Programming Languages and Systems (TOPLAS)
Revisiting the Sequential Programming Model for Multi-Core
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Pipa: pipelined profiling and analysis on multi-core systems
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Embla - Data Dependence Profiling for Parallel Programming
CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Estimating and exploiting potential parallelism by source-level dependence profiling
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Kremlin: like gprof, but for parallelization
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
General data structure expansion for multi-threading
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
Retrofitting existing software for the increasingly dominant multicore microprocessors has a strong appeal from the economic point of view. One of the key issues in such an effort is to fully understand the data dependences in the existing software. Unfortunately, current compilers have quite limited ability to analyze data dependences. Therefore, execution-driven data dependence profiling has gained significant interest because it can resolve memory access ambiguity exactly during program execution, which allows data dependences to be analyzed exactly. Although such dependence profiling is valid for specific inputs only, the insight it provides can be highly valuable to software engineers in their parallelization effort. On the other hand, dependence profiling itself can take tremendous memory and machine time. In this paper, we propose a novel dependence profiling method which, with the support of several new compiler and runtime techniques, partitions the profiling task into many independent slices, each requiring significantly less memory. Different slices can be profiled in parallel, producing subgraphs which are eventually combined automatically into the complete data dependence graph by the compiler. The slices can be extracted with different degrees of granularity. Experiments show that, for several well-known benchmark programs, our parallel scheme shortens the profiling time by a few orders of magnitude.