Predecessor/successor approach for high-performance run-time wavefront scheduling

Authors:
Tsung-Chuan Huang;Po-Hsueh Hsu
Affiliations:
Department of Electrical Engineering, National Sun Yat-sen University, 804 Kaohsiung, Taiwan, ROC;Department of Electronic Engineering, Cheng Shiu Institute of Technology, 833 Kaohsiung Hsien, Taiwan, ROC
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 10
Cited 1

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
A practical run-time technique for exploiting loop-level parallelism

Journal of Systems and Software
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing

An exact data dependence testing method for quadratic expressions

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Most scientific applications rely on parallel multiprocessor computing to enhance performance. However, the irregular loops within these applications obstruct the parallelism analysis at compile-time. Rauchwerger et al. presented a run-time method to extract the hidden parallelism in a program using dependence chains. The relative overhead degrades this approach's performance due to the mass storage requirement and huge array reference processing. In this study, a new predecessor/successor approach is developed in which high-level predecessor/successor information is recorded and processed efficiently. A predecessor/successor table is constructed first in the inspector phase so that only the successor iterations in the current wavefront need to be examined, instead of the entire loop iterations during wavefront scheduling. Usually, the performance of dependence chain approach degrades dramatically for a hot-spot access pattern, but our scheme works very efficiently in this case. The experimental results using synthetic code and real programs are presented to prove the superiority of the proposed approach.