Computer
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semi-automatic process partitioning for parallel computation
International Journal of Parallel Programming
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Supporting shared data structures on distributed memory architectures
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Compile-time techniques for parallel execution of loops on distributed memory multiprocessors
Compile-time techniques for parallel execution of loops on distributed memory multiprocessors
Compiling programs for nonshared memory machines
Compiling programs for nonshared memory machines
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Compiling for locality of reference
Compiling for locality of reference
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Toward automatic partitioning of arrays on distributed memory computers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Reducing communication by honoring multiple alignments
ICS '95 Proceedings of the 9th international conference on Supercomputing
Mappings for communication minimization using distribution and alignment
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Data-localization for Fortran macro-dataflow computation using partial static task assignment
ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
A hyperplane based approach for optimizing spatial locality in loop nests
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers
The Journal of Supercomputing
Deriving Array Distributions by Optimization Techniques
The Journal of Supercomputing
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler optimizations for scalable parallel systems
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
A compilation method for communication—efficient partitioning of DOALL loops
Compiler optimizations for scalable parallel systems
Compiler optimization of dynamic data distributions for distributed-memory multicomputers
Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Journal of Supercomputing
Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality
International Journal of Parallel Programming
Supporting Irregular Distributions Using Data-Parallel Languages
IEEE Parallel & Distributed Technology: Systems & Technology
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Improving Locality in Out-of-Core Computations Using Data Layout Transformations
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing
The Journal of Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Linear data distribution based on index analysis
High performance scientific and engineering computing
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
Automatic array partitioning based on the Smith normal form
International Journal of Parallel Programming
Storage assignment during high-level synthesis for configurable architectures
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
The Journal of Supercomputing
Memetic algorithms for parallel code optimization
International Journal of Parallel Programming
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
Applying Data Mapping Techniques to Vector DSPs
Journal of Signal Processing Systems
Parallel image processing with the block data parallel architecture
IBM Journal of Research and Development
Automatic memory partitioning and scheduling for throughput and power optimization
Proceedings of the 2009 International Conference on Computer-Aided Design
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Compiler-assisted data distribution for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Memory partitioning and scheduling co-optimization in behavioral synthesis
Proceedings of the International Conference on Computer-Aided Design
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
A solution to the problem of partitioning data for distributed memory machines is discussed. The solution uses a matrix notation to describe array accesses in fully parallel loops, which allows the derivation of sufficient conditions for communication-free partitioning (decomposition) of arrays. A series of examples that illustrate the effectiveness of the technique for linear references, the use of loop transformations in deriving the necessary data decompositions, and a formulation that aids in deriving heuristics for minimizing a communication when communication-free partitions are not feasible are presented.