Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications

  • Authors:
  • Diego Melpignano;Luca Benini;Eric Flamand;Bruno Jego;Thierry Lepley;Germain Haugou;Fabien Clermidy;Denis Dutoit

  • Affiliations:
  • STMicroelectronics - AST, Grenoble, France;STMicroelectronics - AST, Grenoble, France and University of Bologna--DEIS, Bologna, Italy;STMicroelectronics - AST, Grenoble, France;STMicroelectronics - AST, Grenoble, France;STMicroelectronics - AST, Grenoble, France;STMicroelectronics - AST, Grenoble, France;CEA-LETI, Grenoble, France;CEA-LETI, Grenoble, France

  • Venue:
  • Proceedings of the 49th Annual Design Automation Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready and can be customized to achieve extreme area and energy efficiency by adding domain-specific HW IPs to the cluster. The first P2012 SoC prototype in 28nm CMOS will sample in Q3, featuring four 16-processor clusters, a 1MB L2 memory and delivering 80GOPS (with 32 bit single precision floating point support) in 18mm2 with 2W power consumption (worst-case). P2012 can run standard OpenCL™ and proprietary Native Programming Model SW components to achieve the highest level of control on application-to-resource mapping. A dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration. This paper will discuss preliminary performance measurements of common feature extraction and tracking algorithms, parallelized on P2012, versus sequential execution on ARM CPUs.