The parallel execution of DO loops
Communications of the ACM
PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementation and evaluation of a microthread architecture
Journal of Systems Architecture: the EUROMICRO Journal
Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Enhanced Parallel Loop Self-Scheduling for Heterogeneous Multi-core Cluster Systems
ISPAN '09 Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
µTC: an intermediate language for programming chip multiprocessors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Experiences in using cetus for source-to-source transformations
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hi-index | 0.00 |
In this paper we suggest a new approach for solving the hyperplane problem, also known as ''wavefront'' computation. In direct contrast to most approaches that reduce the problem to an integer programming one or use several heuristic approaches, we gather information at compile time and delegate the solution to run time. We present an adaptive technique which intuitively calculates which new threads will be able to be executed in the next computation cycle based on which threads are executed in the current one. Moving the solution to the run time environment provides us with higher versatility alongside a perfect solution of the underlying hyperplane pattern being discovered without the need to perform any prior calculations. The main contribution of this paper is the presentation of the self adaptive algorithm, an algorithm which does not need to know the tile size (which controls the granularity of parallelism) beforehand. Instead, the algorithm itself adapts the tile size while the program is running in order to achieve optimal efficiency. Experimental results show that if we have a sufficient number of parallel processing elements to diffuse the scheduler's workload, its overhead becomes low enough that it is overshadowed by the net gain in parallelism. For the implementation of the algorithm we suggest, and for our experimentations our parallelizing compiler C2@mTC/SL is used, a C parallelizing compiler which maps sequential programs on the SVP processor and model.