A comparison of full and partial predicated execution support for ILP processors

Authors:
Scott A. Mahlke;Richard E. Hank;James E. McCormick;David I. August;Wen-Mei W. Hwu
Affiliations:
Hewlett Packard Laboratories, Palo Alto, CA and Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
Venue:
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Year:
1995

Citing 15
Cited 37

Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A framework for balancing control flow and predication

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
The program decision logic approach to predicated execution

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

International Journal of Parallel Programming
Clustered VLIW architecture with predicated switching

Proceedings of the 38th annual Design Automation Conference
On the Boosting of Instruction Scheduling by Renaming

The Journal of Supercomputing
Application domains for fixed-length block structured architectures

ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Optimizing a 3D image reconstruction algorithm: investigating the interaction between the high-level implementation, the compiler and the architecture

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Enhancing loop buffering of media and telecommunications applications using low-overhead predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Instruction generation for hybrid reconfigurable systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Technology Outlook: Introduction to Predicated Execution

Computer
Introducing the FR500 Embedded Microprocessor

IEEE Micro
Introducing the IA-64 Architecture

IEEE Micro
Hybrid Predication Model for Instruction Level Parallelism

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Novel Predication Scheme for a SIMD System-on-Chip

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
An Architecture Framework for Introducing Predicated Execution into Embedded Microprocessors

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicate prediction for efficient out-of-order execution

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploring Microprocessor Architectures for Gigascale Integration

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

IEEE Transactions on Computers
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Code Analysis for Temporal Predictability

Real-Time Systems
Selective predicate prediction for out-of-order processors

Proceedings of the 20th annual international conference on Supercomputing
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hybrid-scheduling for reduced energy consumption in high-performance processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Transparent control independence (TCI)

Proceedings of the 34th annual international symposium on Computer architecture
Handling Control Data Flow Graphs for a Tightly Coupled Reconfigurable Accelerator

ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

IEICE - Transactions on Information and Systems
Dynamic branch prediction and control speculation

International Journal of High Performance Systems Architecture
Object-relative addressing: compressed pointers in 64-bit java virtual machines

ECOOP'07 Proceedings of the 21st European conference on Object-Oriented Programming
Compiling for time predictability

SAFECOMP'12 Proceedings of the 2012 international conference on Computer Safety, Reliability, and Security
Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA

ACM Transactions on Architecture and Code Optimization (TACO)
State-based full predication for low power coarse-grained reconfigurable architecture

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Evaluator-executor transformation for efficient pipelining of loops with conditionals

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support predicated execution can be difficult. On one end of the design spectrum, architectural support for full predicated execution requires increasing the number of source operands for all instructions. Full predicate support provides for the most flexibility and the largest potential performance improvements. On the other end, partial predicated execution support, such as conditional moves, requires very little change to existing architectures. This paper presents a preliminary study to qualitatively and quantitatively address the benefit of full and partial predicated execution support. With our current compiler technology, we show that the compiler can use both partial and full predication to achieve speedup in large control-intensive programs. Some details of the code generation techniques are shown to provide insight into the benefit of going from partial to full predication. Preliminary experimental results are very encouraging: partial predication provides an average of 33% performance improvement for an 8-issue processor with no predicate support while full predication provides an additional 30% improvement.