DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Authors:
Tianshi Chen;Zidong Du;Ninghui Sun;Jia Wang;Chengyong Wu;Yunji Chen;Olivier Temam
Affiliations:
ICT, Beijing, China;ICT, Beijing, China;ICT, Beijing, China;ICT, Beijing, China;ICT, Beijing, China;ICT, Beijing, China;Inria, Saclay, France
Venue:
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Year:
2014

Citing 24
Cited 0

An electrically trainable artificial neural network (ETANN) with 10240 “Floating Gate” synapses

Artificial neural networks
Support-Vector Networks

Machine Learning
Software assistance for data caches

Future Generation Computer Systems - Special issue on high-performance computer architecture
On the capabilities of neural networks using limited precision weights

Neural Networks
Finite Precision Error Analysis of Neural Network Hardware Implementations

IEEE Transactions on Computers
Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
An empirical evaluation of deep architectures on problems with many factors of variation

Proceedings of the 24th international conference on Machine learning
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Low-power, high-performance analog neural branch prediction

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
A dynamically configurable coprocessor for convolutional neural networks

Proceedings of the 37th annual international symposium on Computer architecture
A case for neuromorphic ISAs

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
An efficient hardware architecture for a neural network activation function generator

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part III
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating neuromorphic vision algorithms for recognition

Proceedings of the 49th Annual Design Automation Conference
Dynamically Reconfigurable Silicon Array of Spiking Neurons With Conductance-Based Synapses

IEEE Transactions on Neural Networks
A defect-tolerant accelerator for emerging high-performance applications

Proceedings of the 39th Annual International Symposium on Computer Architecture
Neural Acceleration for General-Purpose Approximate Programs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
BenchNN: On the broad potential application scope of hardware neural network accelerators

IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
Convolution engine: balancing efficiency & flexibility in specialized computing

Proceedings of the 40th Annual International Symposium on Computer Architecture
Learning deep structured semantic models for web search using clickthrough data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.