Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA

Authors:
Kyuseung Han;Junwhan Ahn;Kiyoung Choi
Affiliations:
Seoul National University;Seoul National University;Seoul National University
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 16
Cited 0

A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving the Operation Autonomy of SIMD Processing Elements by Using Guarded Instructions and Pseudo Branches

DSD '02 Proceedings of the Euromicro Symposium on Digital Systems Design
Architecture, Memory and Interface Technology Integration of an Industrial/Academic Configurable System-on-Chip (CSoC)

ISVLSI '03 Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'03)
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Power-conscious configuration cache structure and code mapping for coarse-grained reconfigurable architecture

Proceedings of the 2006 international symposium on Low power electronics and design
Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter

Proceedings of the conference on Design, automation and test in Europe
Improving Branch Prediction and Predicated Execution in Out-of-Order Processors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Energy-Aware Interconnect Optimization for a Coarse Grained Reconfigurable Processor

VLSID '08 Proceedings of the 21st International Conference on VLSI Design
Mighty-morphing power-SIMD

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
MuCCRA-3: a low power dynamically reconfigurable processor array

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Image processing applications on a low power highly parallel SIMD architecture

AERO '11 Proceedings of the 2011 IEEE Aerospace Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coarse-grained reconfigurable architecture typically has an array of processing elements which are controlled by a centralized unit. This makes it difficult to execute programs having control divergence among PEs without predication. However, conventional predication techniques have a negative impact on both performance and power consumption due to longer instruction words and unnecessary instruction-fetching decoding nullifying steps. This article reveals performance and power issues in predicated execution which have not been well-addressed yet. Furthermore, it proposes fast and power-efficient predication mechanisms. Experiments conducted through gate-level simulation show that our mechanism improves energy-delay product by 11.9% to 23.8% on average.