The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
On embedding a microarchitectural design language within Haskell
Proceedings of the fourth ACM SIGPLAN international conference on Functional programming
Design Alternatives of Multithreaded Architecture
International Journal of Parallel Programming
DATE '00 Proceedings of the conference on Design, automation and test in Europe
ACM Transactions on Computer Systems (TOCS)
Loop Shifting for Loop Compaction
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Scientific computing on the Itanium™ processor
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Hybrid Predication Model for Instruction Level Parallelism
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Loop Shifting for Loop Compaction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Machine-Checked Theory of Floating Point Arithmetic
TPHOLs '99 Proceedings of the 12th International Conference on Theorem Proving in Higher Order Logics
Verified Optimizations for the Intel IA-64 Architecture
TPHOLs '00 Proceedings of the 13th International Conference on Theorem Proving in Higher Order Logics
Formal Verification of Explicitly Parallel Microprocessors
CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Formal Verification of IA-64 Division Algorithms
TPHOLs '00 Proceedings of the 13th International Conference on Theorem Proving in Higher Order Logics
A compiler framework for speculative analysis and optimizations
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A Compiler Framework for Recovery Code Generation in General Speculative Optimizations
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Reactive Techniques for Controlling Software Speculation
Proceedings of the international symposium on Code generation and optimization
A probabilistic pointer analysis for speculative optimizations
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A framework for reducing instruction scheduling overhead in dynamic compilers
CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
Scientific computing on the Itanium® processor
Scientific Programming - Best papers from SC 2001
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Dynamic branch prediction and control speculation
International Journal of High Performance Systems Architecture
Software-based branch predication for AMD GPUs
ACM SIGARCH Computer Architecture News
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Floating-Point verification using theorem proving
SFM'06 Proceedings of the 6th international conference on Formal Methods for the Design of Computer, Communication, and Software Systems
Hi-index | 4.10 |
Over the past several years, strategies to increase microprocessor performance have focused on finding more instruction-level parallelism. ILP is basically the idea of finding several instructions to execute at the same time. By providing multiple functional units on which to execute instructions, computer architects expect to improve performance. However, two difficult problems limit ILP: branch instructions, which introduce control dependencies, and memory latency, the time it takes to retrieve data from memory. In the absence of new programming languages that are explicitly parallel, the task of "exposing" ILP falls to the compiler. In IA-64, Intel's upcoming 64-bit architecture, the compiler will play a pivotal role in using predication and control speculation to expose more ILP. To illustrate predication and control speculation, this article presents two code fragments scheduled with actual IA-64 instructions that are representative of general-purpose integer code, such as that found in computer-aided design and database applications. A comparison of performance with and without the two features demonstrates how predication and control speculation can yield a significant reduction in the number of cycles required to execute an instruction.