Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

Authors:
Hironori Kasahara;Motoki Obata;Kazuhisa Ishizaka
Affiliations:
-;-;-
Venue:
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Year:
2000

Citing 23
Cited 7

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Parallel processing of near fine grain tasks using static scheduling OSCAR (optimally scheduled advanced multiprocessor)

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Static and dynamic evaluation of data dependence analysis

ICS '93 Proceedings of the 7th international conference on Supercomputing
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
On the Automatic Parallelization of the Perfect Benchmarks®

IEEE Transactions on Parallel and Distributed Systems
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality optimizations for multi-level caches

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Symbolic Analysis for Parallelizing Compilers

Symbolic Analysis for Parallelizing Compilers
Loop Parallelization

Loop Parallelization
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Optimization of Data/Control Conditions in Task Graphs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A Multi-Grain Parallelizing Compilation Scheme for OSCAR (Optimally Scheduled Advanced Multiprocessor)

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Achieving Multi-level Parallelization

ISHPC '97 Proceedings of the International Symposium on High Performance Computing
Locality Optimizations for Parallel Machines

CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Interprocedural Analysis for Parallelization

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing

Static Coarse Grain Task Scheduling with Cache Optimization Using OpenMP

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A New Task Scheduling Method for Distributed Programs which Require Memory Management in Grids

SAINT-W '04 Proceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)
Specific problems in programming multicore systems

CIMMACS '10 Proceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics
Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hierarchical parallelism control for multigrain parallel processing

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by "Earliest Executable Condition Analysis" considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.