Function inlining and loop unrolling for loop acceleration in reconfigurable processors

Authors:
Narasinga Rao Miniskar;Pankaj Shailendra Gode;Soma Kohli;Donghoon Yoo
Affiliations:
Samsung India Software Operations Pvt. Ltd, Bengaluru, India;Samsung India Software Operations Pvt. Ltd, Bengaluru, India;Samsung India Software Operations Pvt. Ltd, Bengaluru, India;Samsung Electronics, Giheung, Maryland, South Korea
Venue:
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2012

Citing 8
Cited 0

Computer imaging recipes in C

Computer imaging recipes in C
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Architecture Exploration for a Reconfigurable Architecture Template

IEEE Design & Test
UFS: a global trade-off strategy for loop unrolling for VLIW architectures: Research Articles

Concurrency and Computation: Practice & Experience - 10th International Workshop on Compilers for Parallel Computers (CPC 2003)
The impact of loop unrolling on controller delay in high level synthesis

Proceedings of the conference on Design, automation and test in Europe
COFFEE: compiler framework for energy-aware exploration

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The next generation SoCs for consumer electronics need software solutions for faster time-to-market, lower development cost and higher performance while maintaining lower energy consumption and area. As a result, reconfigurable processors (RPs) have become increasingly important, which enables just enough exibility of accepting software solutions and providing application-specific hardware reconfigurability. Samsung Electronics has developed a reconfigurable processor called Samsung Reconfigurable Processor (SRP), which is the basis of our work. Though, the SRP is a powerful processor, it requires a smart and intelligent compiler to compile the application software while exploring its reconfigurable architecture. The existing compiler for the SRP does not support functional inlining and loop unrolling, and no study has yet been done on these optimizations for the RPs. In this paper, we study the impact of these optimizations on the performance of applications for the SRP processor and we also show how these optimizations are supported in the SRP compiler. We analyze the performance improvement due to these optimizations on various benchmarks namely Sobel Edge filter, JPEG decoder, and Luma Deblocking filter of the H.264 standard. Our experimental results have shown about 83% gain on performance with the functional inlining optimization and the loop unrolling optimization when compared to the original code for Sobel filter and JPEG encoder, and 11% gain on performance for Luma Deblock filter.