Automatic program parallelization using stateless parallel processing architecture

Authors:
Feijian Sun;Yuan Shi
Affiliations:
Temple University;Temple University
Venue:
Automatic program parallelization using stateless parallel processing architecture
Year:
2004

Citing 0
Cited 1

Tuple switching network-When slower may be better

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This thesis investigates a new approach for automatic sequential-to-parallel program translation for distributed-memory multicomputers by leveraging a dataflow computation model. It is well-known that dataflow computation models allow automatic dependency uncovering without complex static dependency analysis. This dissertation focuses on a coarse-grain dataflow model supported by a stateless parallel processing architecture. Under this model, the user discovers an overall dependency pattern. This pattern is used to direct program partition and data distribution strategies. Non-linear, indirect and conditional dependencies are automatically uncovered at runtime. In contrast, other approaches, such as the PARADIGM project at UIUC, have encountered insurmountable difficulties when attempting to solve nonlinear and other dependencies at compile time. In this thesis, the dataflow computation model is provided by the Synergy system—a preliminary implementation of the stateless parallel processing architecture using multiple networked computers. This approach includes a new Parallelization Markup Language (PML). It is used to describe an overall dependency pattern by marking sequential program segments. A PML compiler translates the marked sequential program into multiple parallel programs using the extracted dependency pattern. Based on XML technique, PML is portable and extensible. It is completely independent from programming languages. The PML tags are also powerful enough to describe very complex dependency patterns (thus data partition strategies). Theoretically, this methodology is applicable to all types of iterative compute-intense applications. In this thesis, we have chosen four well-recognized numerical applications to illustrate the practical utility and efficiency of this method: Matrix Multiplication, Laplacian Solver using Gauss-Siedel iterations, Ion Generation Simulator and Block LU Factorization. We show that performance measurements from generated programs compare favorably against manually written parallel programs. (Abstract shortened by UMI.)