Theory of linear and integer programming
Theory of linear and integer programming
International Journal of Parallel Programming
International Journal of Parallel Programming
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
ICS '88 Proceedings of the 2nd international conference on Supercomputing
An architecture independent programming language for low-level vision
Computer Vision, Graphics, and Image Processing
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compile-time generation of regular communications patterns
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accurate analysis of array references
Accurate analysis of array references
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs
IEEE Transactions on Parallel and Distributed Systems
Integrating Scalar Optimization and Parallelization
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Lazy array data-flow dependence analysis
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ICS '94 Proceedings of the 8th international conference on Supercomputing
Static analysis of upper and lower bounds on dependences and parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication optimizations for parallel computing using data access information
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Detection and global optimization of reduction operations for distributed parallel machines
ICS '96 Proceedings of the 10th international conference on Supercomputing
Minimizing communication while preserving parallelism
ICS '96 Proceedings of the 10th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Communication optimizations for parallel C programs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Problem and machine sensitive communication optimization
ICS '98 Proceedings of the 12th international conference on Supercomputing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
A balanced code placement framework
ACM Transactions on Programming Languages and Systems (TOPLAS)
Global optimization techniques for automatic parallelization of hybrid applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs
International Journal of Parallel Programming
Contention elimination by replication of sequential sections in distributed shared memory programs
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A framework for global communication analysis of optimizations
Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran
Compiler optimizations for scalable parallel systems
Supporting dynamic data structures with Olden
Compiler optimizations for scalable parallel systems
The Efficient Computation of Ownership Sets in HPF
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Expressing cross-loop dependencies through hyperplane data dependence analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Temporal Characterization of Demands for Data Movement on Parallel Programs
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The Plan-Du Style Compilation Technique for Eager Data Transfer in Thread-Based Execution
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Toward Compiler Support for Scalable Parallelism Using Multipartitioning
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Accurate Data and Context Management in Message-Passing Programs
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Generating Realignment-Based Communication for HPF Programs
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler-generated communication for pipelined FPGA applications
Proceedings of the 40th annual Design Automation Conference
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Template-based program restructuring - initial experience
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
EPPP - an integrated environment for portable parallel programming
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Automatic decomposition in EPPP compiler
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Using cache optimizing compiler for managing software cache on distributed shared memory system
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Sourcebook of parallel computing
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
Linear data distribution based on index analysis
High performance scientific and engineering computing
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Compiler-directed proactive power management for networks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Message-passing code generation for non-rectangular tiling transformations
Parallel Computing
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Communication optimizations for global multi-threaded instruction scheduling
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Extracting synchronization-free threads in perfectly nested loops using the omega project software
SEPADS'05 Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Efficient, portable implementation of asynchronous multi-place programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Extracting synchronization-free slices of operations in perfectly-nested loops
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Finding synchronization-free parallelism for non-uniform loops
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Finding coarse grained parallelism in computational geometry algorithms
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
Automatic parallelization of simulink applications
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Reducing task creation and termination overhead in explicitly parallel programs
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Exploiting single-assignment properties to optimize message-passing programs by code transformations
IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
Efficient tiled loop generation: D-tiling
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic speculative DOALL for clusters
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 6th International Systems and Storage Conference
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Generating efficient data movement code for heterogeneous architectures with distributed-memory
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
ASC: automatically scalable computation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.