An electrically trainable artificial neural network (ETANN) with 10240 “Floating Gate” synapses
Artificial neural networks
Machine Learning
Software assistance for data caches
Future Generation Computer Systems - Special issue on high-performance computer architecture
Finite Precision Error Analysis of Neural Network Hardware Implementations
IEEE Transactions on Computers
Robust Object Recognition with Cortex-Like Mechanisms
IEEE Transactions on Pattern Analysis and Machine Intelligence
An empirical evaluation of deep architectures on problems with many factors of variation
Proceedings of the 24th international conference on Machine learning
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Low-power, high-performance analog neural branch prediction
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
A dynamically configurable coprocessor for convolutional neural networks
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
An efficient hardware architecture for a neural network activation function generator
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part III
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating neuromorphic vision algorithms for recognition
Proceedings of the 49th Annual Design Automation Conference
Dynamically Reconfigurable Silicon Array of Spiking Neurons With Conductance-Based Synapses
IEEE Transactions on Neural Networks
A defect-tolerant accelerator for emerging high-performance applications
Proceedings of the 39th Annual International Symposium on Computer Architecture
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
BenchNN: On the broad potential application scope of hardware neural network accelerators
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
Convolution engine: balancing efficiency & flexibility in specialized computing
Proceedings of the 40th Annual International Symposium on Computer Architecture
Learning deep structured semantic models for web search using clickthrough data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.