Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Advanced compiler design and implementation
Advanced compiler design and implementation
IEEE Transactions on Parallel and Distributed Systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Adaptive Optimizing Compilers for the 21st Century
The Journal of Supercomputing
Parallel Programming with Polaris
Computer
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
The Advantages of Instance-Wise Reaching Definition Analyses in Array (S)SA
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Generation of Synchronous Code for Automatic Parallelization of while Loops
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops
CC '96 Proceedings of the 6th International Conference on Compiler Construction
Faster Bit-Parallel Approximate String Matching
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Improving Software Pipelining With Unroll-and-Jam
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Techniques for Software Thread Integration in Real-Time Embedded Systems
RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Procedure Cloning and Integration for Converting Parallelism from Coarse to Fine Grain
INTERACT '03 Proceedings of the Seventh Workshop on Interaction between Compilers and Computer Architectures
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exploring the Performance Potential of Itanium® Processors with ILP-based Scheduling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 31st annual international symposium on Computer architecture
The Value Evolution Graph and its Use in Memory Reference Analysis
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Complementing software pipelining with software thread integration
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Branch strategies to optimize decision trees for wide-issue architectures
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Collisions of SHA-0 and reduced SHA-1
EUROCRYPT'05 Proceedings of the 24th annual international conference on Theory and Applications of Cryptographic Techniques
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
The polyhedral model is more widely applicable than you think
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
A number of compute-intensive applications suffer from performance loss due to the lack of instruction-level parallelism in sequences of dependent instructions. This is particularly accurate on wide-issue architectures with large register banks, when the memory hierarchy (locality and bandwidth) is not the dominant bottleneck. We consider two real applications from computational biology and from cryptanalysis, characterized by long sequences of dependent instructions, irregular control-flow and intricate scalar and array dependence patterns. Although these applications exhibit excellent memory locality and branch-prediction behavior, state-of-the-art loop transformations and back-end optimizations are unable to exploit much instruction-level parallelism. We show that good speedups can be achieved through deep jam, a new transformation of the program control- and data-flow. Deep jam combines scalar and array renaming with a generalized form of recursive unroll-and-jam; it brings together independent instructions across irregular control structures, removing memorybased dependences. This optimization contributes to the extraction of fine-grain parallelism in irregular applications. We propose a feedback-directed deep jam algorithm, selecting a jamming strategy, function of the architecture and application charactristics.