Interprocedural dependence analysis and parallelization
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A framework for determining useful parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Efficient interprocedural analysis for program parallelization and restructuring
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
A technique for summarizing data access and its use in parallelism enhancing transformations
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Selected papers of the second workshop on Languages and compilers for parallel computing
Parallel algorithms for banded linear systems
SIAM Journal on Scientific and Statistical Computing
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Interprocedural transformations for parallel code generation
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Stable parallel algorithms for two-point boundary value problems
SIAM Journal on Scientific and Statistical Computing
Journal of Parallel and Distributed Computing
Algorithm 711: BTN: software for parallel unconstrained optimization
ACM Transactions on Mathematical Software (TOMS)
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Analysis of synchronization in a parallel programming environment
Analysis of synchronization in a parallel programming environment
The cedar system and an initial performance study
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Automatic and interactive parallelization
Automatic and interactive parallelization
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler reduction of synchronisation in shared virtual memory systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
On the Automatic Parallelization of the Perfect Benchmarks®
IEEE Transactions on Parallel and Distributed Systems
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Program Transformation for Locality Using Affinity Regions
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A preprocessing step for global loop transformations for data transfer optimization
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Interprocedural Transformations for Extracting Maximum Parallelism
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining
Journal of Parallel and Distributed Computing
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Proceedings of the 2009 conference on Information Science, Technology and Applications
Compiler directed parallelization of loops in scale for shared-memory multiprocessors
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Embedded Systems Design
Locality optimized shared-memory implementations of iterated runge-kutta methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines.