The connection machine
Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A practical algorithm for exact array dependence analysis
Communications of the ACM
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication-free hyperplane partitioning of nested loops
Journal of Parallel and Distributed Computing
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Dynamic data distributions in Vienna Fortran
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Evaluating compiler optimizations for Fortran D
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Unified Interprocedural Parallelism Detection
International Journal of Parallel Programming
Parallel Programming with Polaris
Computer
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Compiler Techniques for Effective Communication on Distributed-Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Compiler Abstraction for Machine Independent Parallel Communication Generation
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
The Alignment-Distribution Graph
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Parallel SPMD Programs
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Parallelization of Benchmarks for Scalable Shared-Memory Multiprocessors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Access Descriptor based Locality Analysis for Distributed-Shared Memory Multiprocessors
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Experimental Study of Compiler Techniques for NUMA Machines
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Interprocedural parallelization using memory classification analysis
Interprocedural parallelization using memory classification analysis
Symbolic Communication Set Generation for Irregular Parallel Applications
The Journal of Supercomputing
Compiler Techniques for the Distribution of Data and Computation
IEEE Transactions on Parallel and Distributed Systems
Analyses for the translation of OpenMP codes into SPMD style with array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Compiler-assisted data distribution for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
The Cray T3D and T3E are non-cache-coherent (NCC) computers with a NUMA structure. They have been shown to exhibit a very stable and scalable performance for a variety of application programs. Considerable evidence suggests that they are more stable and scalable than many other shared-memory multiprocessors. However, the principal drawback of these machines is a lack of programmability, caused by the absence of the global cache coherence that is necessary to provide a convenient shared view of memory in hardware. This forces the programmer to keep careful track of where each piece of data is stored, a complication that is unnecessary when a pure shared-memory view is presented to the user. We believe that a remedy for this problem is advanced compiler technology. In this paper, we present our experience with a compiler framework for automatic parallelization and communication generation that has the potential to reduce the time-consuming hand-tuning that would otherwise be necessary to achieve good performance with this type of machine. From our experiments, we learned that our compiler performs well for a variety of applications on the T3D and T3E and we found a few sophisticated techniques that could improve performance even more once they are fully implemented in the compiler.