SD3: A Scalable Approach to Dynamic Data-Dependence Profiling

Authors:
Minjang Kim;Hyesoon Kim;Chi-Keung Luk
Affiliations:
-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 23
Cited 14

Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Advanced compiler design and implementation

Advanced compiler design and implementation
Whole program paths

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Introduction to Algorithms

Introduction to Algorithms
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization

IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Whole Execution Traces

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance

Proceedings of the International Symposium on Code Generation and Optimization
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Modeling optimistic concurrency using quantitative dependence analysis

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Pipa: pipelined profiling and analysis on multi-core systems

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Visualizing potential parallelism in sequential programs

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A concurrent dynamic analysis framework for multicore hardware

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications

Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parkour: parallel speedup estimates for serial programs

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
ATDetector: improving the accuracy of a commercial data race detector by identifying address transfer

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient and accurate data dependence profiling using software signatures

Proceedings of the Tenth International Symposium on Code Generation and Optimization
VMAD: an advanced dynamic program analysis and instrumentation framework

CC'12 Proceedings of the 21st international conference on Compiler Construction
Fast loop-level data dependence profiling

Proceedings of the 26th ACM international conference on Supercomputing
Multi-slicing: a compiler-supported parallel approach to data dependence profiling

Proceedings of the 2012 International Symposium on Software Testing and Analysis
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
General data structure expansion for multi-threading

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Hybrid and multicore optimized architectures for test and simulation systems

Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques
CUBIT: compact bitmap profiling for dynamic data dependence analysis

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Online dynamic dependence analysis for speculative polyhedral parallelization

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.04

Visualization

Abstract

As multicore processors are deployed in mainstream computing, the need for software tools to help parallelize programs is increasing dramatically. Data-dependence profiling is an important technique to exploit parallelism in programs. More specifically, manual or automatic parallelization can use the outcomes of data-dependence profiling to guide where to parallelize in a program. However, state-of-the-art data-dependence profiling techniques are not scalable as they suffer from two major issues when profiling large and long-running applications: (1) runtime overhead and (2) memory overhead. Existing data-dependence profilers are either unable to profile large-scale applications or only report very limited information. In this paper, we propose a scalable approach to data-dependence profiling that addresses both runtime and memory overhead in a single framework. Our technique, called SD3, reduces the runtime overhead by parallelizing the dependence profiling step itself. To reduce the memory overhead, we compress memory accesses that exhibit stride patterns and compute data dependences directly in a compressed format. We demonstrate that SD3 reduces the runtime overhead when profiling SPEC 2006 by a factor of 4.1X and 9.7X on eight cores and 32 cores, respectively. For the memory overhead, we successfully profile SPEC 2006 with the reference input, while the previous approaches fail even with the train input. In some cases, we observe more than a 20X improvement in memory consumption and a 16X speedup in profiling time when 32 cores are used.