On data synchronization for multiprocessors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
GTS: parallelization and vectorization of tight recurrences
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
An approach to ordering optimizing transformations
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Vectorization and parallelization of irregular problems via graph coloring
ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts
ICS '91 Proceedings of the 5th international conference on Supercomputing
Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Experiences with data dependence abstractions
ICS '91 Proceedings of the 5th international conference on Supercomputing
Extending the I test to direction vectors
ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
PATCH—a new algorithm for rapid incremental dependence analysis
ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Removal of redundant dependences in DOACROSS loops with constant dependences
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Generating explicit communication from shared-memory program references
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Performing data flow analysis in parallel
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Subdomain dependence test for massive parallelism
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Efficient and exact data dependence analysis
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Size and access inference for data-parallel programs
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Techniques for debugging parallel programs with flowback analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Debugging parallelized code using code liberation techniques
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Retire Fortran? A debate rekindled
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Seismic modeling at 14 gigaflops on the connection machine
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Interprocedural transformations for parallel code generation
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Instruction-level parallelism in Prolog: analysis and architectural support
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
Eliminating false data dependences using the Omega test
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A comprehensive approach to parallel data flow analysis
ICS '92 Proceedings of the 6th international conference on Supercomputing
On exact data dependence analysis
ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
A transformational approach to compiling Sisal for distributed memory architectures
ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Non-unimodular transformations of nested loops
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Low copy message passing on the Alliant CAMPUS/800
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Interprocedural modification side effect analysis with pointer aliasing
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A practical data flow framework for array reference analysis and its use in optimizations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
Compiling machine-independent parallel programs
ACM SIGPLAN Notices
CMAX: a Fortran translator for the connection machine system
ICS '93 Proceedings of the 7th international conference on Supercomputing
Partitioning the statement per iteration space using non-singular matrices
ICS '93 Proceedings of the 7th international conference on Supercomputing
Compilation techniques for sparse matrix computations
ICS '93 Proceedings of the 7th international conference on Supercomputing
Partitioning the global space for distributed memory systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Advanced compiler optimizations for sparse computations
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Data flow analysis for parallel programs
CSC '93 Proceedings of the 1993 ACM conference on Computer science
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unified compilation of Fortran 77D and 90D
ACM Letters on Programming Languages and Systems (LOPLAS)
Program optimization and parallelization using idioms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ICS '94 Proceedings of the 8th international conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Exploiting cache affinity in software cache coherence
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler and runtime support for out-of-core HPF programs
ICS '94 Proceedings of the 8th international conference on Supercomputing
Static analysis of upper and lower bounds on dependences and parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilation of out-of-core data parallel programs for distributed memory machines
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Dynamic memory disambiguation for array references
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Fusing loops with backward inter loop data dependence
ACM SIGPLAN Notices
Instruction scheduling in the TOBEY compiler
IBM Journal of Research and Development
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
An extended form of must alias analysis for dynamic allocation
POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced Array Optimizations for High Performance Functional Languages
IEEE Transactions on Parallel and Distributed Systems
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters
IEEE Transactions on Parallel and Distributed Systems
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
An empirical study of precise interprocedural array analysis
Scientific Programming
Flattening and parallelizing irregular, recurrent loop nests
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Extracting task-level parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Run-time methods for parallelizing partially parallel loops
ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
Vectorization beyond data dependences
ICS '95 Proceedings of the 9th international conference on Supercomputing
Practical approach to single assignment code
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Translation of serial recursive codes to parallel SIMD codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
IEEE Transactions on Parallel and Distributed Systems
Automatic Data Structure Selection and Transformation for Sparse Matrix Computations
IEEE Transactions on Parallel and Distributed Systems
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Anticipatory instruction scheduling
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Detection and global optimization of reduction operations for distributed parallel machines
ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-localization for Fortran macro-dataflow computation using partial static task assignment
ICS '96 Proceedings of the 10th international conference on Supercomputing
The future of program analysis
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
On the perfect accuracy of an approximate subscript analysis test
ICS '90 Proceedings of the 4th international conference on Supercomputing
Joint Minimization of Code and Data for Synchronous DataflowPrograms
Formal Methods in System Design
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Alias analysis of executable code
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Simulation/evaluation environment for a VLIW processor architecture
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
Constraint-based array dependence analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
An efficient message-passing scheduler based on guided self scheduling
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Statically Safe Speculative Execution for Real-Time Systems
IEEE Transactions on Software Engineering
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Timing Analysis for Data and Wrap-Around Fill Caches
Real-Time Systems
From flop to megaflops: Java for technical computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Solutions of Simple Indexed Recurrence Equations
IEEE Transactions on Parallel and Distributed Systems
Register-sensitive selection, duplication, and sequencing of instructions
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop parallelization algorithms
Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
A schema for interprocedural modification side-effect analysis with pointer aliasing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic partitioning and virtual scheduling for efficient parallel execution
ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Efficient Parallel Execution of Irregular Recursive Programs
IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Compiling stencils in high performance Fortran
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Immutability specification and its applications
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Synthesis of Embedded Software from Synchronous Dataflow Specifications
Journal of VLSI Signal Processing Systems
Sunder: a programmable hardware prefetch architecture for numerical loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Expressing cross-loop dependencies through hyperplane data dependence analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Enabling unimodular transformations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Relative Debugging of Automatically Parallelized Programs
Automated Software Engineering
International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences
IEEE Transactions on Parallel and Distributed Systems
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization
IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence
IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Program Structuring for Effective Parallel Portability
IEEE Transactions on Parallel and Distributed Systems
Loop Coalescing and Scheduling for Barrier MIMD Architectures
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
The Classification, Fusion, and Parallelization of Array Language Primitives
IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices
IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays
IEEE Transactions on Parallel and Distributed Systems
Efficient Pipelining of Nested Loops: Unroll-and-Squash
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
From Flop to MegaFlops: Java for Technical Computing
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Irregular Assignment Computations on cc-NUMA Multiprocessors
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Efficient Dependence Analysis for Java Arrays
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Transformations on Doubly Nested Loops
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Efficient Execution of Doacross Loops on Distributed Memory Systems
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Software Pipelining: Petri Net Pacemaker
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Parallel Computation: MM +/- X
Informatics - 10 Years Back. 10 Years Ahead.
Techniques for Reducing the Overhead of Run-Time Parallelization
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Advanced Scalarization of Array Syntax
CC '00 Proceedings of the 9th International Conference on Compiler Construction
A Technique for FPGA Synthesis Driven by Automatic Source Code Analysis and Transformations
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Speculative Parallelization of Partially Parallel Loops
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Practical parallel computing
Cluster computing with message-passing interface
Highly parallel computaions
Optimized software synthesis for synchronous dataflow
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
A Loop Transformation for Maximizing Parallelism from Single Loops with Nonuniform Dependencies
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A transformation method to reduce loop overhead in HPF compiler
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Simulation of aerodynamics problem on a distributed shared-memory machine
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Detection of Implicit Parallelisms in the Task Parallel Language
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A New Transformation Method to Generate Optimized DO Loop from FORALL Construct
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Sourcebook of parallel computing
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Local supercomputing training in the computational sciences using remote national centers
Future Generation Computer Systems - Special issue: Selected papers from the workshop on education in computational sciences held at the ICCS 2002
Transforming Complex Loop Nests for Locality
The Journal of Supercomputing
What can we gain by unfolding loops?
ACM SIGPLAN Notices
High performance air pollution modeling for a power plant environment
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
High performance air pollution simulation on shared memory systems
High performance scientific and engineering computing
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
High Performance Air Pollution Simulation Using OpenMP
The Journal of Supercomputing
Interprocedural dependence analysis and parallelization
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
Automatic blocking of QR and LU factorizations for locality
MSP '04 Proceedings of the 2004 workshop on Memory system performance
A novel approach for partitioning iteration spaces with variable densities
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion
International Journal of High Performance Computing Applications
Proceedings of the sixth international symposium on Automated analysis-driven debugging
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Efficient Techniques for Advanced Data Dependence Analysis
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Software integrity protection using timed executable agents
ASIACCS '06 Proceedings of the 2006 ACM Symposium on Information, computer and communications security
Information Processing Letters
A general approach for partitioning N-dimensional parallel nested loops with conditionals
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
May-happen-in-parallel analysis of X10 programs
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
NUMACROS: data parallel programming on NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions
Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
Revisiting Cache Block Superloading
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Information Processing Letters
Paper: A comparative study of automatic vectorizing compilers
Parallel Computing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Polynomial time array dataflow analysis
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Verification by parallelization of parametric code
Algebraic and proof-theoretic aspects of non-classical logics
McFLAT: a profile-based framework for MATLAB loop analysis and transformations
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Applying data copy to improve memory performance of general array computations
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Parallelization of utility programs based on behavior phase analysis
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Automating verification of loops by parallelization
LPAR'06 Proceedings of the 13th international conference on Logic for Programming, Artificial Intelligence, and Reasoning
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Techniques for the parallelization of unstructured grid applications on multi-GPU systems
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Matrix-Based programming optimization for improving memory hierarchy performance on imagine
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Efficient parallel implementation of sequence analysis algorithms using a global address space model
Mathematical and Computer Modelling: An International Journal
Optimization techniques for efficient HTA programs
Parallel Computing
Layout-oblivious compiler optimization for matrix computations
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation
International Journal of Parallel Programming
Hi-index | 0.02 |
From the Publisher:Effective use of a supercomputer requires users to have a good algorithm and to express this algorithm in an appropriate language, and requires compilers to generate efficient code. This book investigates several problems facing compiler design for supercomputers, including building efficient and comprehensive data dependence graphs, recurrence relations, the management of compiler temporary variables, and WHILE loops. The book first proposes an efficient means of representing the flow of data in a program by labeling the arcs in a data dependence graph with "direction vectors" to show how the flow of data corresponds to the loop structure of the program. These data dependence direction vectors are then used in several high level compiler loop optimizations: loop vectorization, loop concurrentization, loop fusion, and loop interchanging. The book shows how to perform these transformations and how to use them to optimize programs for a wide range of supercomputers. The problems of recurrence relations studied include arithmetic recurrences with IF statements and recurrences involving both data and control dependence relations in a cycle. The wavefront method of solving recurrences is also treated. The book discusses ways to make the problem of managing temporary arrays more tractable. It concludes by offering several methods for executing WHILE loops and describes a general structure of an optimizing compiler for supercomputers developed from the author's experience with a test bed compiler. Michael Wolfe is Associate Professor in the Computer Science and Engineering Department at the Oregon Graduate Center Optimizing Supercompilers forSupercomputers is included in the series Research Monographs in Parallel Computing. Copublished with Pitman Publishing.