Quantifying the potential task-based dataflow parallelism in MPI applications

  • Authors:
  • Vladimir Subotic;Roger Ferrer;Jose Carlos Sancho;Jesús Labarta;Mateo Valero

  • Affiliations:
  • Barcelona Supercomputing Center, Universitat Politecnica de Catalunya;Barcelona Supercomputing Center, Universitat Politecnica de Catalunya;Barcelona Supercomputing Center, Universitat Politecnica de Catalunya;Barcelona Supercomputing Center, Universitat Politecnica de Catalunya;Barcelona Supercomputing Center, Universitat Politecnica de Catalunya

  • Venue:
  • Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Task-based parallel programming languages require the programmer to partition the traditional sequential code into smaller tasks in order to take advantage of the existing dataflow parallelism inherent in the applications. However, obtaining the partitioning that achieves optimal parallelism is not trivial because it depends on many parameters such as the underlying data dependencies and global problem partitioning. In order to help the process of finding a partitioning that achieves high parallelism, this paper introduces a framework that a programmer can use to: 1) estimate how much his application could benefit from dataflow parallelism; and 2) find the best strategy to expose dataflow parallelism in his application. Our framework automatically detects data dependencies among tasks in order to estimate the potential parallelism in the application. Furthermore, based on the framework, we develop an interactive approach to find the optimal partitioning of code. To illustrate this approach, we present a case study of porting High Performance Linpack from MPI to MPI/SMPSs. The presented approach requires only superficial knowledge of the studied code and iteratively leads to the optimal partitioning strategy. Finally, the environment provides visualization of the simulated MPI/SMPSs execution, thus allowing the developer to qualitatively inspect potential parallelization bottlenecks.