The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Communications of the ACM - Special issue on computer architecture
An algorithm for reduction of operator strength
Communications of the ACM
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Structure of Computers and Computations
Structure of Computers and Computations
Fortran for the Texas Instruments ASC system
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Control and data dependence for program transformations.
Control and data dependence for program transformations.
Dependence analysis for subscripted variables and its application to program transformations
Dependence analysis for subscripted variables and its application to program transformations
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Dependence of multi-dimensional array references
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A Simplified Framework for Reduction in Strength
IEEE Transactions on Software Engineering
Parallelizing a database programming language
DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Vectorizing compilers: a test suite and results
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Compiling issues for supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Implementing Gauss Jordan on a hypercube multicomputer
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Static analysis of low-level synchronization
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Automatic generation of DAG parallelism
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
On optimal loop parallelization
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
GTS: parallelization and vectorization of tight recurrences
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The parascope editor: an interactive parallel programming tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Profiling an Incremental Data Flow Analysis Algorithm
IEEE Transactions on Software Engineering
EVA: an explicit vector language
ACM SIGPLAN Notices
Optimizing programs over the constructive reals
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Compilation of Haskell array comprehensions for scientific computing
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
An efficient hybrid algorithm for incremental data flow analysis
POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A foundation for sequentializing parallel code
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Analysis of event synchronization in a parallel programming tool
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
An approach to ordering optimizing transformations
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Program optimization and parallelization using idioms
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic transformation of series expressions into loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Vectorization and parallelization of irregular problems via graph coloring
ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts
ICS '91 Proceedings of the 5th international conference on Supercomputing
Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Experiences with data dependence abstractions
ICS '91 Proceedings of the 5th international conference on Supercomputing
Extending the I test to direction vectors
ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Analysis and transformation in the ParaScope editor
ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic parallelization of communicating sequential processes
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Generating explicit communication from shared-memory program references
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Subdomain dependence test for massive parallelism
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Efficient and exact data dependence analysis
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
Techniques for debugging parallel programs with flowback analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Interprocedural transformations for parallel code generation
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A semantics-directed partitioning of a processor architecture
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Four Dimensions of programming-language independence
ACM SIGPLAN Notices
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
A practical algorithm for exact array dependence analysis
Communications of the ACM
Delinearization: an efficient way to break multiloop dependence equations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines
ICS '92 Proceedings of the 6th international conference on Supercomputing
Automatic software cache coherence through vectorization
ICS '92 Proceedings of the 6th international conference on Supercomputing
On exact data dependence analysis
ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Loop distribution with multiple exits
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Non-unimodular transformations of nested loops
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
ACM Letters on Programming Languages and Systems (LOPLAS)
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
CMAX: a Fortran translator for the connection machine system
ICS '93 Proceedings of the 7th international conference on Supercomputing
Unified compilation of Fortran 77D and 90D
ACM Letters on Programming Languages and Systems (LOPLAS)
Lazy array data-flow dependence analysis
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Static analysis of upper and lower bounds on dependences and parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
An annotated bibliography of interactive program steering
ACM SIGPLAN Notices
Parallelizing Subroutines in Sequential Programs
IEEE Software
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Integer Programming for Array Subscript Analysis
IEEE Transactions on Parallel and Distributed Systems
A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Extracting task-level parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Abstract interpretation and low-level code optimization
PEPM '95 Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Symbolic array dataflow analysis for array privatization and program parallelization
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Vectorization beyond data dependences
ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
On the perfect accuracy of an approximate subscript analysis test
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Compiler blockability of dense matrix factorizations
ACM Transactions on Mathematical Software (TOMS)
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
High-level semantic optimization of numerical codes
ICS '99 Proceedings of the 13th international conference on Supercomputing
Combining structural and procedural programming by parallelizing compilation
SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
Constraint based vectorization
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Intererence analysis tools for parallelizing programs with recursive data structures
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Overview of a high-performance programmable pipeline structure
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
SAC '94 Proceedings of the 1994 ACM symposium on Applied computing
A case for source-level transformations in MATLAB
Proceedings of the 2nd conference on Domain-specific languages
Clustering Algorithm for Parallelizing Software Systems in Multiprocessors Environment
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Proceedings of the 2000 ACM SIGSOFT international symposium on Software testing and analysis
ICS '01 Proceedings of the 15th international conference on Supercomputing
Register-sensitive selection, duplication, and sequencing of instructions
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop parallelization algorithms
Compiler optimizations for scalable parallel systems
Compiler optimizations for scalable parallel systems
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
On achieving balanced power consumption in software pipelined loops
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Compatibility of Systems of Linear Constraints over the Set of Natural Numbers
Cybernetics and Systems Analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Enabling unimodular transformations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compilation Techniques for Multimedia Processors
International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
International Journal of Parallel Programming
Automatic Intra-Register Vectorization for the Intel® Architecture
International Journal of Parallel Programming
Reducing and Vectorizing Procedures for Telescoping Languages
International Journal of Parallel Programming
Parallelizing Programs with Recursive Data Structures
IEEE Transactions on Parallel and Distributed Systems
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On the Efficient Engineering of Ambitious Program Analysis
IEEE Transactions on Software Engineering
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Criteria of Satisfiability for Homogeneous Systems of Linear Diophantine Constraints
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Derivation of Safety Requirements for Safety Analysis of Object-Oriented Design Documents
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Effects of Loop Fusion and Statement Migration on the Speedup of Vector Multiprocessors
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Advanced Scalarization of Array Syntax
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Evaluating the Effectiveness of a Parallelizing Compiler
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Automatic data mapping of signal processing applications
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
A Loop Transformation for Maximizing Parallelism from Single Loops with Nonuniform Dependencies
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Efficient support for pipelining in software distributed shared memory systems
Real-time system security
Sourcebook of parallel computing
Automatic compilation to a coarse-grained reconfigurable system-opn-chip
ACM Transactions on Embedded Computing Systems (TECS)
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
An experimental evaluation of scalar replacement on scientific benchmarks
Software—Practice & Experience
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
An extended ANSI C for processors with a multimedia extension
International Journal of Parallel Programming
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Retargeting Sequential Image-Processing Programs for Data Parallel Execution
IEEE Transactions on Software Engineering
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
Exploitation of parallelism to nested loops with dependence cycles
Journal of Systems Architecture: the EUROMICRO Journal
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Contributions to the GNU compiler collection
IBM Systems Journal
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
An empirical evaluation of chains of recurrences for array dependence testing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Proceedings of the 20th annual international conference on Supercomputing
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
A Dimension Abstraction Approach to Vectorization in Matlab
Proceedings of the International Symposium on Code Generation and Optimization
A case for source-level transformations in MATLAB
DSL'99 Proceedings of the 2nd conference on Conference on Domain-Specific Languages - Volume 2
Algorithms for solution of systems of linear diophantine equations in residue fields
Cybernetics and Systems Analysis
NUMACROS: data parallel programming on NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation
Scientific Programming
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Trade-offs in loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
A SIMD optimization framework for retargetable compilers
ACM Transactions on Architecture and Code Optimization (TACO)
A case study on compiler optimizations for the Intel® Core™ 2 duo processor
International Journal of Parallel Programming
Applying Data Mapping Techniques to Vector DSPs
Journal of Signal Processing Systems
A directive-based MPI code generator for Linux PC clusters
The Journal of Supercomputing
Cybernetics and Systems Analysis
Eclpss: a Java-based framework for parallel ecosystem simulation and modeling
Environmental Modelling & Software
The Fortran parallel transformer and its programming environment
Information Sciences: an International Journal
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Paper: A comparative study of automatic vectorizing compilers
Parallel Computing
Multi-processor computer system having low power consumption
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Simple section interchange and properties of non-computable functions
Science of Computer Programming
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
A platform-independent tool for modeling parallel programs
Proceedings of the 49th Annual Southeast Regional Conference
Sisal 3.2 language features overview
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Array replication to increase parallelism in applications mapped to configurable architectures
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Boosting the performance of multimedia applications using SIMD instructions
CC'05 Proceedings of the 14th international conference on Compiler Construction
Compiling high-level languages for vector architectures
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Extending a C-like language for portable SIMD programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Mapping streaming languages to general purpose processors through vectorization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Unrolling loops containing task parallelism
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Memory space conscious loop iteration duplication for reliable execution
SAS'05 Proceedings of the 12th international conference on Static Analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Scout: a source-to-source transformator for SIMD-Optimizations
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Improving performance of OpenCL on CPUs
CC'12 Proceedings of the 21st international conference on Compiler Construction
Algorithmic species: A classification of affine loop nests for parallel programming
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Sierra: a SIMD extension for C++
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors
Journal of Signal Processing Systems
Hi-index | 0.04 |
The recent success of vector computers such as the Cray-1 and array processors such as those manufactured by Floating Point Systems has increased interest in making vector operations available to the FORTRAN programmer. The FORTRAN standards committee is currently considering a successor to FORTRAN 77, usually called FORTRAN 8x, that will permit the programmer to explicitly specify vector and array operations.Although FORTRAN 8x will make it convenient to specify explicit vector operations in new programs, it does little for existing code. In order to benefit from the power of vector hardware, existing programs will need to be rewritten in some language (presumably FORTRAN 8x) that permits the explicit specification of vector operations. One way to avoid a massive manual recoding effort is to provide a translator that discovers the parallelism implicit in a FORTRAN program and automatically rewrites that program in FORTRAN 8x.Such a translation from FORTRAN to FORTRAN 8x is not straightforward because FORTRAN DO loops are not always semantically equivalent to the corresponding FORTRAN 8x parallel operation. The semantic difference between these two constructs is precisely captured by the concept of dependence. A translation from FORTRAN to FORTRAN 8x preserves the semantics of the original program if it preserves the dependences in that program.The theoretical background is developed here for employing data dependence to convert FORTRAN programs to parallel form. Dependence is defined and characterized in terms of the conditions that give rise to it; accurate tests to determine dependence are presented; and transformations that use dependence to uncover additional parallelism are discussed.