Array privatization for parallel execution of loops
ICS '92 Proceedings of the 6th international conference on Supercomputing
Unified compilation of Fortran 77D and 90D
ACM Letters on Programming Languages and Systems (LOPLAS)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Synchronization minimization in a SPMD execution model
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
On Privatization of Variables for Data-Parallel Execution
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Data Partitioning Algorithm for Distributed Memory Compilation
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
KOAN: A Shared Virtual Memory for the iPSC/2 Hypercube
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
A Compiler Algorithm to Reduce Invalidation Latency in Virtual Shared Memory Systems
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Validity of Interprocedural Data Remapping
Validity of Interprocedural Data Remapping
Hi-index | 0.00 |
This paper describes an automatic parallelising compiler, MARS, targeted for shared memory machines. It uses a data partitioning approach, traditionally used for distributed memory machines, in order to globally reduce overheads such as communication and synchronisation. Its high-level linear algebraic representation allows direct application of, for instance, unimodular transformations and global application of data transformation. Although a data based approach allows global analysis and in many instances outperforms local, loop-orientated parallelisation approaches, we have identified two particular problems when applying data parallelism to sequential Fortran 77 as opposed to data parallel dialects tailored to distributed memory targets. This paper describes two techniques to overcome these problems and evaluates their applicability. Preliminary results, on two SPECf92 benchmarks, show that with these optimisations, MARS outperforms existing state-of-the art loop based auto-parallelisation approaches.