IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting instruction level parallelism in the presence of conditional branches
Exploiting instruction level parallelism in the presence of conditional branches
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Hierarchical reconfigurable computing arrays for efficient CGRA-based embedded systems
Proceedings of the 46th Annual Design Automation Conference
Coarse-grained reconfigurable architecture for multiple application domains: a case study
Proceedings of the 2009 International Conference on Hybrid Information Technology
Journal of Signal Processing Systems
A Bit-Rate Aware Scalable H.264/AVC Deblocking Filter Using Dynamic Partial Reconfiguration
Journal of Signal Processing Systems
Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA
ACM Transactions on Architecture and Code Optimization (TACO)
Compiling control-intensive loops for CGRAs with state-based full predication
Proceedings of the Conference on Design, Automation and Test in Europe
State-based full predication for low power coarse-grained reconfigurable architecture
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Integration, the VLSI Journal
Hi-index | 0.00 |
Deblocking filtering represents one of the most compute intensive tasks in an H.264/AVC standard video decoder due to its demanding memory accesses and irregular data flow. For these reasons, an efficient implementation poses big challenges, especially for programmable platforms. In this sense, the mapping of this decoder's functionality onto a C-programmable coarse-grained reconfigurable architecture named ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) is presented in this paper, including results from the evaluation of different topologies. The results obtained show a considerable reduction in the number of cycles and memory accesses needed to perform the filtering as well as an increase in the degree of instruction parallelism (ILP) when compared with an implementation on a Very Long Instruction Word (VLIW) dedicated processor. This demonstrates that high ILP is achievable on the ADRES even for irregular, data-dependent kernels.