Accelerating vision and navigation applications on a customizable platform

Authors:
J. Cong;B. Grigorian;G. Reinman;M. Vitanza
Affiliations:
Dept. of Comput. Sci., Univ. of California, Los Angeles, CA, USA;Dept. of Comput. Sci., Univ. of California, Los Angeles, CA, USA;Dept. of Comput. Sci., Univ. of California, Los Angeles, CA, USA;Dept. of Comput. Sci., Univ. of California, Los Angeles, CA, USA
Venue:
ASAP '11 Proceedings of the ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors
Year:
2011

Citing 0
Cited 2

Architecture support for accelerator-rich CMPs

Proceedings of the 49th Annual Design Automation Conference
CHARM: a composable heterogeneous accelerator-rich microprocessor

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The domain of vision and navigation often includes applications for feature tracking as well as simultaneous localization and mapping (SLAM). As these problems require computationally demanding solutions, it is challenging to achieve high performance without sacrificing the fidelity of results or otherwise consuming excessive amounts of energy. Our goal then is to accelerate the applications in this domain to meet real-time performance constraints while simultaneously reducing energy consumption and avoiding degradation in the quality of results. To achieve this domain-specific acceleration, we model a customizable hardware platform based on the 3D integration of a Field-Programmable Gate Array (FPGA) atop a standard chip multiprocessor (CMP) with Through-Silicon Vias (TSVs) used for communication between the two layers. Furthermore, partial automation of accelerator creation using C-to-RTL tools allows for analysis of a wide range of candidates. In this work, we mathematically characterize viable accelerator candidates, describe ideal application code for acceleration, and outline a dynamic-programming-based methodology for selecting an optimal set of candidates. Our results yield an overall speedup and energy reduction of 9.56X along with a 94X EDP reduction for the domain. Finally, we investigate the effects of various interconnect models on our performance improvements. Overall, our proposed system is shown to be highly efficient in both accelerating performance and saving energy for compute-intensive applications in this domain.