Proceedings of the 45th annual Design Automation Conference
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Hybrid cache architecture with disparate memory technologies
Proceedings of the 36th annual international symposium on Computer architecture
Low power circuit design based on heterojunction tunneling transistors (HETTs)
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Energy reduction for STT-RAM using early write termination
Proceedings of the 2009 International Conference on Computer-Aided Design
Proceedings of the 2009 International Conference on Computer-Aided Design
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing
Proceedings of the 37th annual international symposium on Computer architecture
An energy efficient cache design using spin torque transfer (STT) RAM
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
TapeCache: a high density, energy efficient cache based on domain wall memory
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
DWM-TAPESTRI - an energy efficient all-spin cache using domain wall shift based writes
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Emerging workloads such as Recognition, Mining and Synthesis present great opportunities for many-core parallel computing, but also place significant demands on the memory system. Spin-based devices have shown great promise in enabling high-density, energy-efficient memory. In this paper, we present the design and evaluation of a many-core domain-specific processor for Recognition and Data Mining (RM) using spin-based memory. The RM processor has a two-level on-chip memory hierarchy consisting of a streaming access first-level memory and a random access second-level memory. Based on the memory access characteristics, we suggest the use of Domain Wall Memory (DWM) and Spin Transfer Torque Magnetic RAM (STT MRAM) to realize the first and second levels, respectively. We develop architectural models of DWM and STT MRAM, and use them to evaluate the proposed design and explore various architectural tradeoffs in the RM processor. We evaluate the proposed design by comparing it to a CMOS based design at the same 45nm technology node. For three representative RM algorithms (Support Vector Machines, k-means clustering, and GLVQ classification), the iso-area spin memory based design achieves an energy-delay product improvement of 1.5X -- 3X. Our results suggest that spin based memory technologies can enable significant improvements in energy efficiency and performance for highly parallel, data-intensive workloads.