A compiler for scalable placement and routing of brain-like architectures

Authors:
Narayan Srinivasa
Affiliations:
HRL Laboratories LLC, Malibu, CA, USA
Venue:
Proceedings of the 2013 ACM international symposium on International symposium on physical design
Year:
2013

Citing 3
Cited 0

Balancing interconnect and computation in a reconfigurable computing array (or, why you don't really want 100% LUT utilization)

FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
FastPlace 3.0: A Fast Multilevel Quadratic Placement Algorithm with Placement Congestion Control

ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

The challenging aspect of building neuromorphic circuits in mature CMOS technology to match brain-like architectures is two-fold: scalability and connectivity. Scalability means that the circuits have to be expandable to match biological brains in terms of synaptic and neuronal densities. The challenge here is to implement 106 neurons and 1010 synapses with an average fanout of 104, in a square cm of CMOS [1, 2]. Connectivity means that the circuit has to offer the capability to have both short and long range (by physical distance) connections between neurons. A large part of this challenge is how to implement a connectivity of 104 synapses per neuron [3]. Unfortunately, even the exponential transistor density growth being experienced today is not sufficient to realize such massive connectivity and synaptic densities in a traditional CMOS process. Recent approaches to address these challenges have been to integrate CMOS with nanotechnology [4, 5] in order to achieve the required synaptic densities. These solutions use crossbar architectures predominantly but the connectivity challenge still remains a daunting task for such solutions [2, 6]. To meet these challenges, a novel synaptic time-multiplexing (STM) concept was developed along with a neural fabric design [7]. This combination has the advantage of offering greater flexibility and long range connectivity. It also provides a method to overcome the limitations of conventional CMOS technology to match the synaptic density and connectivity requirements found in mammalian brains while maintaining non-linear synapses and learning. In order to program neuromorphic hardware [8] for any desired brain architecture, the topology would first have to be converted into a connectivity matrix or a graph representation. This matrix along with the statistics on the number of neurons and synapses is provided as input to a neuromorphic compiler. The neuromorphic compiler compiles the neural network structure description into: 1) an assignment of the network's neurons and synapses to hardware neurons and virtual (multiplexed) synapses, and 2) a STM compatible routing schedule with switch states for the neural fabric at each STM timeslot. For each neuron, the exact location on the chip on the neural fabric should be determined. This is the placement problem. The quality of neuron placement can affect the ability of the routing algorithm to efficiently find the needed synaptic pathways to cover all the synapses within a STM duty cycle. For each synaptic pathway, a set of required grid lines from an output axon of the presynaptic neuron to an input dendrite of the postsynaptic neuron should be determined, and the switches on the way must be set to the ON state. This is the routing problem. The problem of routing and placement is closely related to problems in other programmable hardware such as the FPGA (Field Programmable Gate Array). There are some interesting differences between the neuromorphic solution proposed in this work and those designed for other programmable hardware such as the FPGA. In such applications, most current algorithms [9-11] for placing and routing expect a single timeslot and therefore do not have to address the immense routing demands placed by the problem described here. Unlike FPGA circuits, the neuromorphic hardware is expected to use every neuron device during routing. However, a study of FPGA architecture [12] show that on reconfigurable hardware 100% device utilization results in almost a 200% routing area increase due to congestion problems.