A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Communications of the ACM
Flow diagrams, turing machines and languages with only two formation rules
Communications of the ACM
Control and data dependence for program transformations.
Control and data dependence for program transformations.
A vectorizing Fortran compiler
IBM Journal of Research and Development
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling C for vectorization, parallelization, and inline expansion
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Generating sequential code from parallel code
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Automatic discovery of parallelism: a tool and an experiment (extended abstract)
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Automatic vectorization of character string manipulation and relational operations in Pascal
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A Control-Flow Normalization Algorithm and its Complexity
IEEE Transactions on Software Engineering
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
IEEE Transactions on Computers
Loop distribution with multiple exits
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The transitive closure of control dependence: the iterated join
ACM Letters on Programming Languages and Systems (LOPLAS)
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The role of APL and J in high-performance computation
APL '93 Proceedings of the international conference on APL
Speculative disambiguation: a compilation technique for dynamic memory disambiguation
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Resource-Constrained Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Modulo scheduling with multiple initiation intervals
Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code
Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Analysis techniques for predicated code
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Global predicate analysis and its application to register allocation
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Heuristics for register-constrained software pipelining
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Incremental dependence analysis for interactive parallelization
ICS '90 Proceedings of the 4th international conference on Supercomputing
GPMB—software pipelining branch-intensive loops
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs
ICS '97 Proceedings of the 11th international conference on Supercomputing
A framework for balancing control flow and predication
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Optimal Modulo Scheduling Through Enumeration
International Journal of Parallel Programming
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Modulo Scheduling with Reduced Register Pressure
IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Parallel and Distributed Systems
The program decision logic approach to predicated execution
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Control CPR: a branch height reduction optimization for EPIC architectures
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Exploiting conditional instructions in code generation for embedded VLIW processors
DATE '99 Proceedings of the conference on Design, automation and test in Europe
POPL '85 Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
On the control dependence in the program dependence graph
CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication
International Journal of Parallel Programming
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
IEEE Transactions on Computers
An integrated approach to accelerate data and predicate computations in hyperblocks
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
Clustered VLIW architecture with predicated switching
Proceedings of the 38th annual Design Automation Conference
Power-aware modulo scheduling for high-performance VLIW processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
IEEE Transactions on Computers
A comparative study of modulo scheduling techniques
ICS '02 Proceedings of the 16th international conference on Supercomputing
Optimal software pipelining of loops with control flows
ICS '02 Proceedings of the 16th international conference on Supercomputing
Using predicate path information in hardware to determine true dependences
ICS '02 Proceedings of the 16th international conference on Supercomputing
Efficient static single assignment form for predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Folklore confirmed: reducible flow graphs are exponentially larger
POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
Path Analysis and Renaming for Predicated Instruction Scheduling
International Journal of Parallel Programming
Control Flow Regeneration for Software Pipelined Loops with Conditions
International Journal of Parallel Programming
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
A finite state machine based format model of software pipelined loops with conditions
Progress in computer research
Just-In-Time Java? Compilation for the Itanium® Processor
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Temporary Arrays for Distribution of Loops with Control Dependences
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Formal Verification of Explicitly Parallel Microprocessors
CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Taxonomy and Description of Policy Combination Methods
POLICY '01 Proceedings of the International Workshop on Policies for Distributed Systems and Networks
Sea Cucumber: A Synthesizing Compiler for FPGAs
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Static Analysis for Guarded Code
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Predicate-aware scheduling: a technique for reducing resource constraints
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
UltraSPARC: Compiling for Maximum Floating Point Performance
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
A hierarchical basis for reordering transformations
POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The program dependence graph in a software development environment
SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments
Exploiting compiler-generated schedules for energy savings in high-performance processors
Proceedings of the 2003 international symposium on Low power electronics and design
Rule-Based Building-Block Architectures for Policy-Based Networking
Journal of Network and Systems Management
Effectiveness of cross-platform optimizations for a java just-in-time compiler
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Register allocation for optimal loop scheduling
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Using Hammock Graphs to Structure Programs
IEEE Transactions on Software Engineering
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Probabilistic Predicate-Aware Modulo Scheduling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Time optimal software pipelining of loops with control flows
International Journal of Parallel Programming
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A brief survey of program slicing
ACM SIGSOFT Software Engineering Notes
Complementing software pipelining with software thread integration
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Code Analysis for Temporal Predictability
Real-Time Systems
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Selective predicate prediction for out-of-order processors
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Merging Head and Tail Duplication for Convergent Hyperblock Formation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Ablego: a function outlining and partial inlining framework: Research Articles
Software—Practice & Experience
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Proceedings of the International Symposium on Code Generation and Optimization
Krakatoa: decompilation in java (dose bytecode reveal source?)
COOTS'97 Proceedings of the 3rd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 3
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
IEEE Transactions on Computers
A time-predictable VLIW processor and its compiler support
Real-Time Systems
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder
Journal of Signal Processing Systems
Sketching concurrent data structures
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Retargetable code optimization for predicated execution
Proceedings of the conference on Design, automation and test in Europe
On the exploitation of loop-level parallelism in embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
A SIMD optimization framework for retargetable compilers
ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing techniques for saturated arithmetic with first-order linear recurrence
Proceedings of the 2009 ACM symposium on Applied Computing
Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Identifying task-level parallelism by functional transformation with side-effect domains
Proceedings of the 47th Annual Southeast Regional Conference
Applying Data Mapping Techniques to Vector DSPs
Journal of Signal Processing Systems
Equivalence Checking of Static Affine Programs Using Widening to Handle Recurrences
CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
A Single-Path Chip-Multiprocessor System
SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
The Fortran parallel transformer and its programming environment
Information Sciences: an International Journal
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Application of if-conversion to verification and optimization of workflows
Programming and Computing Software
AnySL: efficient and portable shading for ray tracing
Proceedings of the Conference on High Performance Graphics
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A unifying theory of control dependence and its application to arbitrary program structures
Theoretical Computer Science
Boosting the performance of multimedia applications using SIMD instructions
CC'05 Proceedings of the 14th international conference on Compiler Construction
Verification of source code transformations by program equivalence checking
CC'05 Proceedings of the 14th international conference on Compiler Construction
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Extending a C-like language for portable SIMD programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Strategies for predicate-aware register allocation
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving performance of OpenCL on CPUs
CC'12 Proceedings of the 21st international conference on Compiler Construction
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Functional programs that explain their work
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Compiling for time predictability
SAFECOMP'12 Proceedings of the 2012 international conference on Computer Safety, Reliability, and Security
Interleaving and lock-step semantics for analysis and verification of GPU kernels
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Intermediate representations in imperative compilers: A survey
ACM Computing Surveys (CSUR)
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
Proceedings of the 40th Annual International Symposium on Computer Architecture
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Allocating rotating registers by scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Just-In-Time Software Pipelining
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Sierra: a SIMD extension for C++
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.02 |
Program analysis methods, especially those which support automatic vectorization, are based on the concept of interstatement dependence where a dependence holds between two statements when one of the statements computes values needed by the other. Powerful program transformation systems that convert sequential programs to a form more suitable for vector or parallel machines have been developed using this concept [AllK 82, KKLW 80].The dependence analysis in these systems is based on data dependence. In the presence of complex control flow, data dependence is not sufficient to transform programs because of the introduction of control dependences. A control dependence exists between two statements when the execution of one statement can prevent the execution of the other. Control dependences do not fit conveniently into dependence-based program translators.One solution is to convert all control dependences to data dependences by eliminating goto statements and introducing logical variables to control the execution of statements in the program. In this scheme, action statements are converted to IF statements. The variables in the conditional expression of an IF statement can be viewed as inputs to the statement being controlled. The result is that control dependences between statements become explicit data dependences expressed through the definitions and uses of the controlling logical variables.This paper presents a method for systematically converting control dependences to data dependences in this fashion. The algorithms presented here have been implemented in PFC, an experimental vectorizer written at Rice University.