A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Compiler algorithms for synchronization
IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Improving the performance of runtime parallelization
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable method for run-time loop parallelization
International Journal of Parallel Programming
IEEE Transactions on Parallel and Distributed Systems
A practical run-time technique for exploiting loop-level parallelism
Journal of Systems and Software
An efficient algorithm for the run-time parallelization of DOACROSS loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
An exact data dependence testing method for quadratic expressions
Information Sciences: an International Journal
Hi-index | 0.07 |
Most scientific applications rely on parallel multiprocessor computing to enhance performance. However, the irregular loops within these applications obstruct the parallelism analysis at compile-time. Rauchwerger et al. presented a run-time method to extract the hidden parallelism in a program using dependence chains. The relative overhead degrades this approach's performance due to the mass storage requirement and huge array reference processing. In this study, a new predecessor/successor approach is developed in which high-level predecessor/successor information is recorded and processed efficiently. A predecessor/successor table is constructed first in the inspector phase so that only the successor iterations in the current wavefront need to be examined, instead of the entire loop iterations during wavefront scheduling. Usually, the performance of dependence chain approach degrades dramatically for a hot-spot access pattern, but our scheme works very efficiently in this case. The experimental results using synthetic code and real programs are presented to prove the superiority of the proposed approach.