Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
A data structure for dynamic trees
Journal of Computer and System Sciences
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal expression evaluation for data parallel architectures
Journal of Parallel and Distributed Computing
The data alignment phase in compiling programs for distributed-memory machines
Journal of Parallel and Distributed Computing
Automatic data mapping for distributed-memory parallel computers
Automatic data mapping for distributed-memory parallel computers
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Mobile and replicated alignment of arrays in data-parallel programs
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
The Alignment-Distribution Graph
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Array Distribution in Data-Parallel Programs
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
NESL: A Nested Data-Parallel Language
NESL: A Nested Data-Parallel Language
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
Machine and collection abstractions for user-implemented data-parallel programming
Scientific Programming
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Memory minimization for tensor contractions using integer linear programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
We investigate the problem of evaluating Fortran 90-style array expressions on massively parallel distributed-memory machines. On such a machine, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression tree. The choice of where to perform the operation then affects this cost.We describe the communication cost of the parallel machine theoretically as a metric space; we model the alignment problem as that of finding a minimum-cost embedding of the expression tree into this space. We present algorithms based on dynamic programming that solve the embedding problem optimally for several communication cost metrics: multidimensional grids and rings, hypercubes, fat-trees, and the discrete metric. We also extend our approach to handle operations that change the shape of the arrays.