Communications of the ACM - Special section on computer architecture
Interprocedural constant propagation
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A global approach to detection of parallelism
A global approach to detection of parallelism
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Structure of Computers and Computations
Structure of Computers and Computations
Dependence analysis for subscripted variables and its application to program transformations
Dependence analysis for subscripted variables and its application to program transformations
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
A Parallel Programming Environment
IEEE Software
Compiler algorithms for synchronization
IEEE Transactions on Computers
Impact of self-scheduling order on performance on multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Automatic discovery of parallelism: a tool and an experiment (extended abstract)
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Static analysis of low-level synchronization
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Run-time parallelization and scheduling of loops
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The parascope editor: an interactive parallel programming tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
An interval-based approach to exhaustive and incremental interprocedural data-flow analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Program optimization and parallelization using idioms
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Experiences with data dependence abstractions
ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop displacement: an approach for transforming and scheduling loops for parallel execution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Generating explicit communication from shared-memory program references
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Interprocedural transformations for parallel code generation
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
IEEE Transactions on Computers
Program optimization and parallelization using idioms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
HeNCE: a heterogenous network computing environment
Scientific Programming
On Effective Execution of Nonuniform DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 14th international conference on Supercomputing
Loop Transformations for Architectures with Partitioned Register Banks
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
International Journal of Parallel Programming
Compile Time Barrier Synchronization Minimization
IEEE Transactions on Parallel and Distributed Systems
Improving the performance of DSM systems via compiler involvement
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Temporary Arrays for Distribution of Loops with Control Dependences
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
On the Optimality of Feautrier's Scheduling Algorithm
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
SAGE: A New Analysis and Optimization System for FlexRAM Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Sourcebook of parallel computing
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
SAGE: an automatic analyzing system for a new high-performance SoC architecture-processor-in-memory
Journal of Systems Architecture: the EUROMICRO Journal
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Distributed Implementation of OpenMP Based on Checkpointing Aided Parallel Execution
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Dependence-based code generation for a CELL processor
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Mapping normalization technique on the HPF compiler fhpf
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
COSPIM: a program optimization system for tightly-coupled heterogeneous environments
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Generalized index-set splitting
CC'05 Proceedings of the 14th international conference on Compiler Construction
Efficient parallel implementation of sequence analysis algorithms using a global address space model
Mathematical and Computer Modelling: An International Journal
Toward a distributed implementation of openMP using CAPE
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Partitioning applications for hybrid and federated clouds
CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Checkpointing aided parallel execution model and analysis
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.01 |
An algorithm for transforming sequential programs into equivalent parallel programs is presented. The method concentrates on finding loops whose separate iterations can be run in parallel without synchronization. Although a simple version of the method can be shown to be optimal, the problem of generating optimal code when loop interchange is employed is shown to be intractable. These methods are implemented in an experimental translation system developed at Rice University.