A comparison of full and partial predicated execution support for ILP processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Phi-Predication for light-weight if-conversion
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
DSD '02 Proceedings of the Euromicro Symposium on Digital Systems Design
ISVLSI '03 Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'03)
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the 2006 international symposium on Low power electronics and design
Proceedings of the conference on Design, automation and test in Europe
Improving Branch Prediction and Predicated Execution in Out-of-Order Processors
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Energy-Aware Interconnect Optimization for a Coarse Grained Reconfigurable Processor
VLSID '08 Proceedings of the 21st International Conference on VLSI Design
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
MuCCRA-3: a low power dynamically reconfigurable processor array
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Image processing applications on a low power highly parallel SIMD architecture
AERO '11 Proceedings of the 2011 IEEE Aerospace Conference
Hi-index | 0.00 |
Coarse-grained reconfigurable architecture typically has an array of processing elements which are controlled by a centralized unit. This makes it difficult to execute programs having control divergence among PEs without predication. However, conventional predication techniques have a negative impact on both performance and power consumption due to longer instruction words and unnecessary instruction-fetching decoding nullifying steps. This article reveals performance and power issues in predicated execution which have not been well-addressed yet. Furthermore, it proposes fast and power-efficient predication mechanisms. Experiments conducted through gate-level simulation show that our mechanism improves energy-delay product by 11.9% to 23.8% on average.