Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Static and dynamic evaluation of data dependence analysis
ICS '93 Proceedings of the 7th international conference on Supercomputing
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Run-time methods for parallelizing partially parallel loops
ICS '95 Proceedings of the 9th international conference on Supercomputing
On the Automatic Parallelization of the Perfect Benchmarks®
IEEE Transactions on Parallel and Distributed Systems
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Symbolic Analysis for Parallelizing Compilers
Symbolic Analysis for Parallelizing Compilers
Loop Parallelization
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Optimization of Data/Control Conditions in Task Graphs
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Achieving Multi-level Parallelization
ISHPC '97 Proceedings of the International Symposium on High Performance Computing
Locality Optimizations for Parallel Machines
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Interprocedural Analysis for Parallelization
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Static Coarse Grain Task Scheduling with Cache Optimization Using OpenMP
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A New Task Scheduling Method for Distributed Programs which Require Memory Management in Grids
SAINT-W '04 Proceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)
Specific problems in programming multicore systems
CIMMACS '10 Proceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hierarchical parallelism control for multigrain parallel processing
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by "Earliest Executable Condition Analysis" considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.