Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Gated SSA-based demand-driven symbolic analysis for parallelizing compilers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Idiom recognition in the Polaris parallelizing compiler
ICS '95 Proceedings of the 9th international conference on Supercomputing
Detection and global optimization of reduction operations for distributed parallel machines
ICS '96 Proceedings of the 10th international conference on Supercomputing
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
Monotonic evolution: an alternative to induction variable substitution for dependence analysis
ICS '01 Proceedings of the 15th international conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Parallel Programming with Polaris
Computer
Enhancing Parallelism by Removing Cyclic Data Dependencies
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Applicability of Program Comprehension to Sparse Matrix Computations
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Data Dependence Testing in Practice
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Locality in the Run-Time Parallelization of Irregular Loops
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Balanced, locality-based parallel irregular reductions
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A compiler framework to detect parallelism in irregular codes
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Optimizing code parallelization through a constraint network based approach
Proceedings of the 43rd annual Design Automation Conference
Precise automatable analytical modeling of the cache behavior of codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
XARK: An extensible framework for automatic recognition of computational kernels
ACM Transactions on Programming Languages and Systems (TOPLAS)
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Code scheduling for optimizing parallelism and data locality
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Program behavior characterization through advanced kernel recognition
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
This paper presents a new approach for the detection of coarse-grain parallelism in loop nests that contain complex computations, including subscripted subscripts as well as conditional statements that introduce complex control flows at run-time. The approach is based on the recognition of the computational kernels calculated in a loop without considering the semantics of the code. The detection is carried out on top of the Gated Single Assignment (GSA) program representation at two different levels. First, the use-def chains between the statements that compose the strongly connected components (SCCs) of the GSA use-def chain graph are analyzed (intra-SCC analysis). As a result, the kernel computed in each SCC is recognized. Second, the use-def chains between statements of different SCCs are examined (inter-SCC analysis). This second abstraction level enables the detection of more complex computational kernels by the compiler. A prototype was implemented using the infrastructure provided by the Polaris compiler. Experimental results that show the effectiveness of our approach for the detection of coarse-grain parallelism in a suite of real codes are presented.