POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Execution time support for adaptive scientific algorithms on distributed
Concurrency: Practice and Experience
Improving the performance of runtime parallelization
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Index array flattening through program transformation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler and run-time support for irregular computations
Compiler and run-time support for irregular computations
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Exploiting spatial regularity in irregular iterative applications
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Compiler Analysis for Irregular Problems in Fortran D
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Techniques for Reducing the Overhead of Run-Time Parallelization
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Localizing Non-Affine Array References
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Sensitivity analysis for automatic parallelization on multi-cores
Proceedings of the 21st annual international conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Exploiting Parallelism with Dependence-Aware Scheduling
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
A model for fusion and code motion in an automatic parallelizing compiler
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Loop transformation recipes for code generation and auto-tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hybrid Hexagonal/Classical Tiling for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Parallelization and locality optimization of affine loop nests has been successfully addressed for shared-memory machines. However, many large-scale simulation applications must be executed in a distributed-memory environment, and use irregular/sparse computations where the control-flow and array-access patterns are data-dependent. In this paper, we propose an approach for effective parallel execution of a class of irregular loop computations in a distributed-memory environment, using a combination of static and runtime analysis. We discuss algorithms that analyze sequential code to generate an inspector and an executor. The inspector captures the data-dependent behavior of the computation in parallel and without requiring complete replication of any of the data structures used in the original computation. The executor performs the computation in parallel. The effectiveness of the framework is demonstrated on several benchmarks and a climate modeling application.