From serial loops to parallel execution on distributed systems

Authors:
George Bosilca;Aurelien Bouteiller;Anthony Danalis;Thomas Herault;Jack Dongarra
Affiliations:
University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN, USA, University of Manchester, Manchester, UK
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Year:
2012

Citing 21
Cited 0

Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient and exact data dependence analysis

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
ScaLAPACK user's guide

ScaLAPACK user's guide
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
A unified framework for nonlinear dependence testing and symbolic analysis

Proceedings of the 18th annual international conference on Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Data dependence analysis techniques for increased accuracy and extracted parallelism

International Journal of Parallel Programming - Special issue II: The 17th annual international conference on supercomputing (ICS'03)
Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling

ACM Transactions on Mathematical Software (TOMS)
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Nonlinear Symbolic Analysis for Advanced Program Parallelization

IEEE Transactions on Parallel and Distributed Systems
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
DAGuE: A Generic Distributed DAG Engine for High Performance Computing

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
DAGuE: A generic distributed DAG engine for High Performance Computing

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Programmability and performance portability are two major challenges in today's dynamic environment. Algorithm designers targeting efficient algorithms should focus on designing high-level algorithms exhibiting maximum parallelism, while relying on compilers and run-time systems to discover and exploit this parallelism, delivering sustainable performance on a variety of hardware. The compiler tool presented in this paper can analyze the data flow of serial codes with imperfectly nested, affine loop-nests and if statements, commonly found in scientific applications. This tool operates as the front-end compiler for the DAGuE run-time system by automatically converting serial codes into the symbolic representation of their data flow. We show how the compiler analyzes the data flow, and demonstrate that scientifically important, dense linear algebra operations can benefit from this analysis, and deliver high performance on large scale platforms.