Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Optimal expression evaluation for data parallel architectures
Journal of Parallel and Distributed Computing
The data alignment phase in compiling programs for distributed-memory machines
Journal of Parallel and Distributed Computing
Automatic data mapping for distributed-memory parallel computers
Automatic data mapping for distributed-memory parallel computers
Compiling data-parallel programs for efficient execution on shared-memory multiprocessors
Compiling data-parallel programs for efficient execution on shared-memory multiprocessors
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Flow Analysis of Computer Programs
Flow Analysis of Computer Programs
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A programming language
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
On the relation between functional and data parallel programming languages
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Unified compilation of Fortran 77D and 90D
ACM Letters on Programming Languages and Systems (LOPLAS)
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Optimal evaluation of array expressions on massively parallel machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Automatic alignment of array data and processes to reduce communication time on DMPPs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Reducing communication by honoring multiple alignments
ICS '95 Proceedings of the 9th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Non-singular data transformations: definition, validity and applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
IEEE Transactions on Parallel and Distributed Systems
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
A compilation method for communication—efficient partitioning of DOALL loops
Compiler optimizations for scalable parallel systems
Compiler optimization of dynamic data distributions for distributed-memory multicomputers
Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90
The Journal of Supercomputing
Segmented Alignment: An Enhanced Model to Align Data Parallel Programs of HPF
The Journal of Supercomputing
Performance Modeling and Composition: A Case Study in Cell Simulation
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Fortran RED - A Retargetable Environment for Automatic Data Layout
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Compilation of a specialized functional language for massively parallel computers
Journal of Functional Programming
Effective communication coalescing for data-parallel applications
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Machine and collection abstractions for user-implemented data-parallel programming
Scientific Programming
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Memory minimization for tensor contractions using integer linear programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Data-parallel languages like Fortran 90 express parallelism in the form of operations on data aggregates such as arrays. Misalignment of the operands of an array operation can reduce program performance on a distributed-memory parallel machine by requiring nonlocal data accesses. Determining array alignments that reduce communication is therefore a key issue in compiling such languages.We present a framework for the automatic determination of array alignments in data-parallel languages such as Fortran 90. Our language model handles array sectioning, reductions, spreads, transpositions, and masked operations. We decompose alignment functions into three constituents: axis, stride, and offset. For each of these subproblems, we show how to solve the alignment problem for a basic block of code, possibly containing common subexpressions. Alignments are generated for all array objects in the code, both named program variables and intermediate results. The alignments obtained by our algorithms are more general than those provided by the “owner-computes” rule. Finally, we present some ideas for dealing with control flow, replication, and dynamic alignments that depend on loop induction variables.