Static coarse grain task scheduling with cache optimization using OpenMP

Authors:
Hirofumi Nakano;Kazuhisa Ishizaka;Motoki Obata;Keiji Kimura;Hironori Kasahara
Affiliations:
Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo, 169-8555, Japan;Waseda University & Japanese Millennium Project IT 21 Advanced Parallelizing Compiler Project;Waseda University & Japanese Millennium Project IT 21 Advanced Parallelizing Compiler Project;Waseda University & Japanese Millennium Project IT 21 Advanced Parallelizing Compiler Project;Waseda University & Japanese Millennium Project IT 21 Advanced Parallelizing Compiler Project
Venue:
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Year:
2003

Citing 5
Cited 2

An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
SMARTS: exploiting temporal locality and parallelism through vertical execution

ICS '99 Proceedings of the 13th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A Multi-Grain Parallelizing Compilation Scheme for OSCAR (Optimally Scheduled Advanced Multiprocessor)

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Foundations for the integration of scheduling techniques into compilers for parallel languages

International Journal of Computational Science and Engineering
Resource management of distributed virtual machines

International Journal of Ad Hoc and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC Ver. 6 update 1 loop parallelizing compiler.