Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Digital photography with flash and no-flash image pairs
ACM SIGGRAPH 2004 Papers
Full-Frame Video Stabilization
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
ACM SIGGRAPH 2004 Course Notes
Chip multi-processor generator
Proceedings of the 44th annual Design Automation Conference
An Energy-Efficient Processor Architecture for Embedded Systems
IEEE Computer Architecture Letters
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Frankencamera: an experimental platform for computational photography
ACM SIGGRAPH 2010 papers
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Customizable Domain-Specific Computing
IEEE Design & Test
SURF: speeded up robust features
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Avoiding game over: bringing design to the next level
Proceedings of the 49th Annual Design Automation Conference
Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
IEEE Transactions on Circuits and Systems for Video Technology
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Q100: the architecture and design of a database processing unit
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Heterogeneous-race-free memory models
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications. We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.