Iteration Disambiguation for Parallelism Identification in Time-Sliced Applications

Authors:
Shane Ryoo;Christopher I. Rodrigues;Wen-Mei W. Hwu
Affiliations:
Center for Reliable and High-Performance Computing and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,;Center for Reliable and High-Performance Computing and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,;Center for Reliable and High-Performance Computing and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,
Venue:
Languages and Compilers for Parallel Computing
Year:
2007

Citing 12
Cited 0

The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A safe approximate algorithm for interprocedural aliasing

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Solving shape-analysis problems in languages with destructive updating

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Using static single assignment form to improve flow-insensitive pointer analysis

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Pointer analysis: haven't we solved this problem yet?

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Instance-wise points-to analysis for loop-based dependence testing

ICS '02 Proceedings of the 16th international conference on Supercomputing
Flow analysis and optimization of LISP-like structures

POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Connection Analysis: A Practical Interprocedural Heap Analysis for C

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Importance of heap specialization in pointer analysis

Proceedings of the 5th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Shape analysis with inductive recursion synthesis

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Media and scientific simulation applications have a large amount of parallelism that can be exploited in contemporary multi-core microprocessors. However, traditional pointer and array analysis techniques often fall short in automatically identifying this parallelism. This is due to the allocation and referencing patterns of time-slicing algorithms, where information flows from one time slice to the next. In these, an object is allocated within a loop and written to, with source data obtained from objects created in previous iterations of the loop. The objects are typically allocated at the same static call site through the same call chain in the call graph, making them indistinguishable by traditional heap-sensitive analysis techniques that use call chains to distinguish heap objects. As a result, the compiler cannot separate the source and destination objects within each time slice of the algorithm. In this work we discuss an analysis that quickly identifies these objects through a partially flow-sensitive technique called iteration disambiguation. This is done through a relatively simple aging mechanism. We show that this analysis can distinguish objects allocated in different time slices across a wide range of benchmark applications within tens of seconds even for complete media applications. We will also discuss the obstacles to automatically identifying the remaining parallelism in studied applications and propose methods to address them.