Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
A fully-asynchronous low-power framework for GALS NoC integration
Proceedings of the Conference on Design, Automation and Test in Europe
Design and Performance Evaluation of Image Processing Algorithms on GPUs
IEEE Transactions on Parallel and Distributed Systems
MEVBench: A mobile computer vision benchmarking suite
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Scenario-based design flow for mapping streaming applications onto on-chip many-core systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Parallel programming patterns for multi-processor SoC: Application to video processing
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints
ACM Transactions on Architecture and Code Optimization (TACO)
3D integration for power-efficient computing
Proceedings of the Conference on Design, Automation and Test in Europe
Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters
Proceedings of the Conference on Design, Automation and Test in Europe
Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP
Proceedings of the First International Workshop on Many-core Embedded Systems
Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCs
Proceedings of the First International Workshop on Many-core Embedded Systems
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Symmetry breaking for multi-criteria mapping and scheduling on multicores
FORMATS'13 Proceedings of the 11th international conference on Formal Modeling and Analysis of Timed Systems
FlexTiles: a globally homogeneous but locally heterogeneous manycore architecture
Proceedings of the 6th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Explicit routing schemes for implementation of cellular automata on processor arrays
Natural Computing: an international journal
Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms
Microprocessors & Microsystems
Hi-index | 0.00 |
P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready and can be customized to achieve extreme area and energy efficiency by adding domain-specific HW IPs to the cluster. The first P2012 SoC prototype in 28nm CMOS will sample in Q3, featuring four 16-processor clusters, a 1MB L2 memory and delivering 80GOPS (with 32 bit single precision floating point support) in 18mm2 with 2W power consumption (worst-case). P2012 can run standard OpenCL™ and proprietary Native Programming Model SW components to achieve the highest level of control on application-to-resource mapping. A dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration. This paper will discuss preliminary performance measurements of common feature extraction and tracking algorithms, parallelized on P2012, versus sequential execution on ARM CPUs.