Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NEURAL ARCHITECTURES AND SYSTEMS AND METHODS OF THEIR TRANSLATION
Document Type and Number:
WIPO Patent Application WO/2018/175538
Kind Code:
A1
Abstract:
A method and a system for implementing a mathematical algorithm in a neural architecture and transferring that neural architecture to an integrated circuit (IC) chip. The neural architecture has neurons that are capable of converting current to frequency, voltage to frequency, frequency to frequency and time to frequency. The neurons can have multi-sensor inputs (multiple synapses) for either scaling or inhibiting neuron outputs. The neural architecture-to-hardware conversion method is specifically tailored for neural architectures for image processing applications.

Inventors:
WOOD RICHARD JOSEPH (US)
HOLLOSI BRENT (US)
D'ANGELO ROBERT (US)
UY WES (US)
FREIFELD GEREMY (US)
LOWRY NATHAN (US)
HUANG HAIYAO (US)
POPPE DOROTHY CAROL (US)
SALTHOUSE CHRISTOPHER (US)
Application Number:
PCT/US2018/023501
Publication Date:
September 27, 2018
Filing Date:
March 21, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CHARLES STARK DRAPER LABORATORY INC (US)
International Classes:
G06N3/063; G06N3/04
Foreign References:
EP2672432A22013-12-11
Other References:
A. VAN SCHAIK: "Building blocks for electronic spiking neural networks", NEURAL NETWORKS, vol. 14, no. 6-7, 9 July 2001 (2001-07-09), pages 617 - 628, XP004310067, DOI: 10.1016/S0893-6080(01)00067-3
J. PARK ET AL: "Autonomous neuromorphic system with four-terminal Si-based synaptic devices", PROCEEDINGS OF THE 31ST INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC'16), 16 June 2016 (2016-06-16), pages 331 - 334, XP055497716, Retrieved from the Internet [retrieved on 20160710]
I. SOURIKOPOULOS ET AL: "A 4-fJ/spike artificial neuron in 65 nm CMOS technology", FRONTIERS IN NEUROSCIENCE, vol. 11, 123, 15 March 2017 (2017-03-15), XP055382019, DOI: 10.3389/fnins.2017.00123
A. BASU ET AL: "Neural dynamics in reconfigurable silicon", IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, vol. 4, no. 5, 3 August 2010 (2010-08-03), pages 311 - 319, XP011316092, DOI: 10.1109/TBCAS.2010.2055157
Attorney, Agent or Firm:
HOUSTON, J. Grant (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A neuromorphic system for processing signals from a sensor, comprising: a synapse; and

a neuron that receives a current from the synapse and produces a frequency output.

2. The system of claim 1, wherein the synapse is controlled by a bias voltage.

3. The system of claim 1, wherein the synapse receives a current from a photodetector.

4. The system of claim 3, wherein the synapse or neuron is controlled by a second sensor.

5. The system of claim 1, further comprising multiple synapses feeding into the same neuron.

6. The system of claim 1, wherein the neuron comprises a capacitor that is charged by the current from the synapse.

7. The system of claim 1, further comprising a comparator that compares the voltage on the capacitor to a threshold voltage and resets the capacitor based on a comparison of the threshold voltage with the voltage of the capacitor.

8. The system of claim 1, further comprising multiple neuromorphic circuit elements, having synapse and neurons, that receive inputs from a single image sensor.

9. The system of claim 1, wherein the neuromorphic circuit elements can have multiple reinforcing or inhibiting inputs such as from an imaging and sound systems.

10. The system of claim 1, further comprising collecting the frequency outputs from multiple neurons and performing a convolution.

11. The system of claim 1, wherein the convolution is performed on the output of an image sensor.

12. A method for embedding in an integrated circuit chip a neuromorphic architecture comprising:

providing multiple neuromorphic circuit elements; and

performing a convolution with the circuit elements.

13. The method of claim 12, wherein the neuromorphic circuit elements can have multiple synapses representing inputs from a single image sensor.

14. The method of claim 12, wherein the neuromorphic circuit elements can have multiple reinforcing or inhibiting inputs such as from an imaging and sound systems.

15. A method of embedding an algorithm into a neuromorphic architecture comprising:

providing a desired algorithm, and generating a hardware-optimized algorithm; generating a neuron-optimized algorithm;

providing a Verilog description of chip design obtained from neural network definition of neuron-optimized algorithm, which in turn is obtained from the hardware-optimized algorithm.

Description:
NEURAL ARCHITECTURES AND SYSTEMS AND METHODS OF THEIR

TRANSLATION

RELATED APPLICATIONS

[ oooi] This application claims the benefit under 35 USC 119(e) of U.S. Provisional Application No. 62/474,353 filed on March 21, 2017, which is incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

[ 0002 ] Artificial Neural Network (ANN) is a computational model (algorithm) based on the neuron model of the human brain. A simple model of a neuron has one or more input nodes (input layer), followed by computer processing (in a hidden layer), which leads to one or more output nodes (output layer). Neurons can be connected to each other and to inputs via synapses in a human nervous system. ANN is a simplified abstraction of its biological counterpart, biological neural network (BNN). In ANN, synapses are weights given to an input node. Thus the value of synapse weight is a measure of the strength of the contribution of the corresponding input in determining the output nodes.

[ 0003] ANNs are widely used for machine learning such as stock market prediction, character recognition, speech recognition, threat recognition, machine vision (also known as computer vision), image compression, etc., to name just a few applications. In general, neural networks are useful for modeling scenarios where the input-output relationship follows complicated patterns which are difficult to visualize and model.

[ 0004 ] The importance of the role of neurons in human vision is readily apparent when one compares current computer-based image processing systems to the way humans interpret image stimulations (i.e., visible light). Human vision is almost instantaneous upon receiving light whereas computer image processing is much slower. In computer image processing, acquired data must be off-loaded to a processor external to the focal plane array for processing. On the other hand, if the pixels in a focal plane array could perform some basic image processing tasks, it would go a long way toward making machine vision approach the speed of human vision. This capability of pixels in a focal plane array would amount to pixels behaving, in a rough approximation, like the neurons responsible for human vision. A neuromorphic focal plane array would be such a focal plane array [0005] Similarly, speech recognition using computers can potentially be accelerated by orders of magnitude if sound signals can be analyzed (at least partially) at the sensor level without post-processing using external processors.

SUMMARY OF THE INVENTION

[0006] This invention concerns neural architectures and systems. In one example, they could be part of a neuromorphic focal plane array capable of real time vision processing.

[0007] The invention describes methods by which a mathematical algorithm is realized in a neural network architecture and then instantiated in neuromorphic circuits and hardware for processing information. "Neuromorphic" here denotes that elements of the network and its physical realization (i.e., circuit or hardware) behave like biological neurons since they accept inputs and process them into intermediate outputs at the neuron level in real-time or near real-time.

[0008] In addition to computer vision, such architectures can be used for sound processing also. However, the emphasis in this invention is on neuromorphic pixels and focal plane arrays of pixels.

[0009] This invention shows neuromorphic models of pixels and operations of pixel arrays, and presents a method for translating them into a circuit board.

[ooio] In general, according to one aspect, the invention disclosed here describes a method and a system for implementing a mathematical algorithm as a neural architecture and realizing that architecture on an integrated circuit (IC) chip. The neural architecture has neurons that are capable converting current to frequency, voltage to frequency, frequency to frequency and time to frequency. The neurons can have multi-sensor (multiple synapses) inputs for either scaling an inhibiting neuron outputs.

[ooii] The mathematical algorithm-to-neural architecture-to-hardware conversion method and system is general in nature but has been specifically demonstrated on mathematical algorithms designed for image processing applications.

[0012] In general, according to another aspect, the invention features neuromorphic system for processing signals from a sensor. The system comprises a synapse and a neuron that receives a current from the synapse and produces a frequency output. [0013] Preferably, the synapse is controlled by a bias voltage. It might receive a current from a photodetector.

[0014] In some examples, the synapse or neuron is controlled by a second sensor. Also, multiple synapses can feed into the same neuron.

[0015] In embodiments, the neuron comprises a capacitor that i s charged by the current from the synapse. A comparator then compares the voltage on the capacitor to a threshold voltage and resets the capacitor based on a comparison of the threshold voltage with the voltage of the capacitor.

[0016] To process information from an image sensor, frequency outputs from multiple neurons are collected to perform a convolution.

[0017] In general, according to another aspect, the invention features a method for embedding in an integrated circuit chip a neuromorphic architecture. The method comprises providing multiple neuromorphic circuit elements and performing a convolution with the circuit elements.

[0018] In general, according to another aspect, the invention features a method of embedding a mathematical algorithm into a neuromorphic architecture. The method comprises providing a desired algorithm, and generating a hardware-optimized algorithm, generating a neuron-optimized algorithm, and providing a Verilog description of chip design obtained from neural network definition of neuron-optimized algorithm, which in turn is obtained from the hardware-optimized algorithm.

[0019] The above and other features of the invention including various novel details of construction and combinations of parts, and other advantages, will now be more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular method and device embodying the invention are shown by way of illustration and not as a limitation of the invention. The principles and features of this invention may be employed in various and numerous embodiments without departing from the scope of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] In the accompanying drawings, reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale; emphasis has instead been placed upon illustrating the principles of the invention. Of the drawings: [ 0021 ] Fig. 1 A is a circuit diagram showing an LIF neuron model with notional timing diagrams in Fig. IB (capacitor voltage) and Fig. 1C (output voltage).

[ 0022 ] Fig. 2 is a circuit diagram showing current to frequency mode conversion (type

1) using a LIF neuron.

[ 0023 ] Fig. 3 is a circuit diagram showing voltage to frequency mode conversion (type

2) using a LIF neuron.

[ 0024 ] Fig. 4 A is a circuit diagram showing frequency-to-frequency mode conversion (type 3) using a LIF neuron with notional voltage timing diagrams in Fig. 4B (capacitor) and Fig. 4C (output).

[ 0025 ] Fig. 5 is a circuit diagram showing time to frequency mode conversion (type 4) using a LIF neuron.

[ 0026] Fig. 6 is a circuit diagram showing a multi-sensor, e.g., light and sound sensors, LIF neuron configuration.

[ 0027 ] Fig. 7A is a circuit diagram showing how one sensor can augment (scale) a second sensor.

[ 0028 ] Fig. 7B is a circuit diagram showing how one sensor can compete against or inhibit the output of a second sensor.

[ 0029 ] Fig. 8 is a circuit diagram showing weighted addition of two inputs in frequency-mode addition.

[ 0030 ] Fig. 9 is a schematic diagram shows a simplified model of frequency-mode addition.

[ 0031 ] Fig. 10 is a schematic diagram a representation of 3 *3 convolution using simplified frequency-mode addition diagram (Fig. 9).

[ 0032 ] Fig. 11 shows the steps involved in implementing mathematical neural algorithms on neuromorphic hardware.

[ 0033 ] Fig. 12 shows how a tree of 2 synapse neuron can repli cate the function of a single 9 synapse neuron.

[ 0034 ] Fig. 13 shows the pixel numbering system for a 2-D image. [ 0035 ] Figs. 14A-14H shows image pixels and "extra pixels" for 3 * 3 convolution of side and corner pixels

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[ 0036] The invention now will be described more fully hereinafter with reference to the accompanying drawings, in which illustrative embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

[ 0037 ] As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. Further, the singular forms and the articles "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise. It will be further understood that the terms: includes, comprises, including and/or comprising, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements,

components, and/or groups thereof. Further, it will be understood that when an element, including component or subsystem, is referred to and/or shown as being connected or coupled to another element, it can be directly connected or coupled to the other element or intervening elements may be present.

[ 0038 ] It will be understood that although terms such as "first" and "second" are used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. Thus, an element discussed below could be termed a second element, and similarly, a second element may be termed a first element without departing from the teachings of the present invention.

[ 0039 ] Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

[ 0040 ] The basic element is a linear integrate-and-fire (LIF) neuron as exhibited in Fig. 1 A. It is comprised of a synapse 10 and a neuron 20. The synapse is comprised of a FET (Field Effect Transistor) 110 or series of FETs; the bias voltage Vbias controls the current flow I in the synapse 10. The neuron is comprised of an integrating capacitor C, comparator COMP, and reset FET 112. Basic operation involves current I, typically from some type of sensor, such as a photodetector, charging the capacitor through the synapse. Once the top plate of capacitor C reaches a threshold voltage, Vth, the comparator COMP fires (a burst of fixed intensity and duration) as shown in the notional timing diagrams Figs. 1 B (for capacitor) and 1 C (for output voltage Vout). This event can be used to propagate information and reset the capacitor voltage allowing subsequent integrate-and- fire cycles to occur.

[ 0041 ] This LIF node is capable of several types of data processing and

transformations depending on the synapse' s gate and source stimulus and the comparator's configuration. Furthermore, the synapse enables weighting of the integrated charge through numerous methods, e.g., FET width scaling, multiple synaptic paths, and adaptable gate voltage bias via wired control or programmable floating-gate. This can be used to perform scalar or non-linear functions allowing for features like per-neuron gain control or more complex mathematical operations like logarithmic transformations.

[ 0042 ] In summary, the core LIF node has several interesting characteristics: 1) Capability to process voltage, current, frequency, or time information, 2) Output in frequency or time, 3) Direct interface with digital logic or subsequent LIF stages enabling further quantization or computation, 4) Input scaling via synapse modulation, 5) Linear or non-linear input-to-output relationship (as configured,) and 6) Very low power

consumption.

[ 0043 ] When applied to large sensor systems, such as image sensors, this node can provide a variety of valuable features: 1) Low power data conversion between sensors and digital layer, 2) Reconfigurable synaptic modulation for real-time scaling changes, 3) Multi-modal processing - processing multiple sensor streams at the same time, and 4) Low power pre-processing, e.g., 2-D convolution, multiplication/division, saliency.

[ 0044 ] Data Conversion capabilities of LIF: [ 0045 ] Although the output of LIF neurons is frequency (voltage spikes per sec), its input can be current, voltage, frequency or time since the inputs are mathematically related as discussed below.

[ 0046] Type 1 Current to Frequency (Fig. 2):

[ 0047 ] In this mode, shown in Fig. 2, a sensor 114 provides current information input Isensor to the synapse 10 from which it emerges as current I. I is integrated onto the capacitor C. When the capacitor voltage reaches Vth, the comparator COMP will produce a fixed-width pulse. In this way, the comparator produces fixed-width pulses at a rate proportional to the supplied current making the output a frequency-coded representation of the sensor current. Sensor current is scaled from 0 to 1 based on the drain current of the synapse which is controlled by Vbias, which may be an analog value or a frequency/time coded signal. The mechanics of this interaction are governed by Eq. 1 (C = capacitance; I = current) where the smaller current is chosen to keep the frequency in an acceptable range.

[ 0048 ] Type 2 Voltage to Frequency (Fig. 3):

[ 0049 ] In this mode, a sensor 114 provides voltage information which produces the integrated current modulated by the resistance of the synapse. A depiction is given in Fig. 3 with the governing equation in E

[ 0050 ] Because of the voltage source, more current scaling options are able to be employed, e.g., FET widening and multiple synaptic paths per input. This allows current gains greater than 1 to be achieved.

[ 0051 ] Type 3 Frequency to Frequency (Figs. 4A-4C):

[ 0052 ] In this mode, the voltage source 114 of the synapse is fixed, and fixed-width pulse trains F m are used to stimulate the synaptic channel. The comparator COMP and reset FET 112 are tuned to generate equivalently sized pulse widths upon firing. An example circuit (Fig. 4A) and notional charging (Fig. 4B) and output (Fig. 4C) depictions are shown in Fig. 4A. The integrate-and-fire mechanics are governed by Eq. 3, where F„ is input frequency and left hand side (LHS) is output frequency, dented by Note that and F ou t are not the same as both the comparator (COMP) and the synapse 10 modulate the firing rate; the pulse widths ΔΤ are fixed but the presence or absence of a pulse is dependent upon the input frequency.

[ 0053 ] Type 4 Time to Frequency (Fig. 5):

[ 0054 ] This mode differs from the Type 3 mode only in that the input is time Tin rather than frequency. An example is given in Fig. 5 with the governing equation in Eq. 4.

[ 0055 ] Output Scaling:

[ 0056] From Eq. 1 , frequency can be independently scaled on a per-pixel basis via synapse current and/or Vth (COMP threshold) modulation. Synapse current can be modulated by Vb,as and/or additional sink or source paths depending on whether the synapse is current or voltage sourced, respectively. Current-sourced synapses allow for a current scaling factor from 0 to 1. Voltage-sourced synapses allow for current scaling factors greater than 1. Vth modulation works as an independent scale factor linearly affecting the output frequency.

[ 0057 ] Multi-sensor Processing:

[ 0058 ] To summarize information presented thus far, the core LIF node is capable of operating on a wide range of information types. Any combinations of

voltage/current/frequency/time to frequency/time processing is achievable through configurations of the synapse and neuron. Furthermore, since the integration mechanism is the same for all modes, additional synaptic pathways controlled by disparate (or similar) sensor types can be added. These additional pathways will operate according to Kirchoff s current laws enabling multi-sensor interaction.

[ 0059 ] An example is shown in Fig. 6. Two disparate sensors 114-1 (light sensor) and 114-2 (sound sensor) both produce current information II (synapse FET 110-1) and 12 (synapse FET 110-2), through their synapses 10-1 and 10-2, integrated onto the same capacitor C. By Kirchoff s current laws, the comparator's frequency output is now proportional to the sum of these scaled currents as shown in Eq. 5. The scale factors (controlled by Vwasi and Vbias2), each between 0 and 1, can be used to indicate which sensor's data stream is more impactful. Note that Fig. 6 is a generalization of Fig. 2 for multiple sensors.

[ 0060] A sensor stream can be used to scale or directly compete against another stream as exemplified by the two images of Figs. 7 A and 7B.

[ 0061] In Fig. 7 A, the voltage output of the sound sensor 114-2, like Vbias of Fig. 2, scales the current contribution of light sensor 114-1. The resulting scaled II charges the capacitor C as described in Figs 1 and 2.

[ 0062 ] In Fig. 7B, the current information of sound sensor 114-2 (though FET 118) inhibits the comparator's firing creating a scenario where the light sensor 114-1 must outdrive the sound sensor 114-2 to produce a firing event.

[ 0063] The above examples show that several linear operations, e.g., addition, subtraction, and scaling, are directly realizable via LIF variations. These operations form the foundation of 2-D convolution, a common pre-processing step to a large variety of algorithms and a near-ubiquitous step in imaging applications.

[ 0064 ] Pre-Processing Computation:

[ 0065] Linear, Frequency Mode :

[ 0066] Fig. 8, a generalization of Fig. 4A, details the configuration for a weighted, two-input frequency-mode addition. Two synapses 10-1 and 10-2, with FET 120-1 and FET 120-2, combine to drive the capacitor C. The synapses produce Ii and h from the same bias current lb using two different weights wi and w 2 as shown in Eq. 6 below.

[ 0067 ] Eqs. 6 through 9 detail the mechanics of this operation. As is shown, Font is proportional to the weighted sum of Fini and Fin2 multiplied by a scaling factor. To allow easier handling of the equations, this operation can be modeled more simply with Eqs. 10 and 11. This simpler model is depicted in Fig. 9, in which N represents Neuron. The synapse weights are wl and w2 on the right, 'fl' and '£2' represent incoming firing rates from other neurons. Fig. 10 shows an example computation step of a 3 * 3 2-D convolution.

[ 0068 ] Fig. 10 shows a convolution window 10-1 of size 3 x 3 with the convolution weights indicated in each of the squares. An arbitrary window of an image is shown in in 10-2. 10-3 show the convolution process where the window positioned on a comer pixel of the image. The corner pixel is at the center of 10-1 as shown in 10-3. 10-4 and 10-5 are neural diagrams of the convolution implementation with weight from four image pixels. 10-6 shows the final diagram with only two inputs to neuron as two of the four weights of the convoluting window equal zero.

[ 0069 ] In addition, a mathematical algorithm can be identified and implemented in neuromorphic hardware. The key components of this process are shown in Fig. 11. Those components are: the mathematical algorithm ST1 to be instantiated in neuromorphic hardware, the hardware-optimized version of the algorithm ST2, the neuron-optimized algorithm ST3, a neural network definition ST4, the Verilog description ST5 of the algorithm, the chip design ST6, and the final neuromorphic hardware (Fabricated IC) ST7. The first four steps, ST1, ST2, ST3, and ST4 are currently highly manual steps but are envisioned to be automated in the future.

[ 0070 ] The mathematical algorithm ST 1 can be provided in a variety of forms, but preferably in some high-level language such as Matlab. Once the high level algorithm is defined, the hardware optimization begins by identifying functional pieces of the algorithm that may not be amenable to hardware implementations. In the current process this is done by manually but in the preferred embodiment automated tools assist in this identification process. Certain mathematical processes, such as argmax, which denote the input value where a function is maximum, or more complex mathematical functions (e.g., von Mises distributions, Bessel functions, etc.) do not lend themselves to straight forward hardware implementations. At this stage, they are replaced with hardware "friendly" approximations or substituted by simpler functions that retain the key aspects of the algorithm (for instance substituting the LI norm for an L2 norm). The result of this step ST2 is the hardware- optimized algorithm.

[ 0071 ] The disclosed neuromorphic hardware affords unique advantages in terms of power savings and computational performance. To provide the best performance of the hardware-optimized algorithm in the neuromorphic hardware, the next step looks at where in the hardware-optimized algorithm one can get further optimizations based on the performance characteristics of the neurons themselves. This can take the form of identifying areas in the algorithms that can take advantage of reduced precision

computation (i.e., such as quantizing kernel weights) and time-mode processing inherent in neural systems. The result is a neuron-optimized version ST3 of the mathematical algorithm.

[ 0072 ] The translation from the neuron-optimized code to the neural network definition involves taking that neuron-optimized code and expanding on the capabilities of neuromorphic computing by performing calculations through chains of synthetic neurons, thereby increasing the number of inputs to the system beyond the limiting number of synapses. A detailed description of this process with a specific example is discussed with respect to scaling below. This results in a neural network definition ST4 that can be realized on neuromorphic hardware.

[ 0073 ] The translation of the neural network definition to the Verilog description ST5 begins the automated process of implementing the algorithm in hardware. The neural network definition is similar to a netlist. The Verilog tools take that netlist and translate it to a Verilog description ST5 which is reviewed and modified to produce a chip design ST6 in adherence to fabrication rules. That chip design is then fabricated resulting in a fabricated IC chip ST7 that has analog neuromorphic circuitry that implements the original mathematical algorithm.

[ 0074 ] Most techniques try to go directly from the mathematical algorithm to neuromorphic hardware without being fully optimized. Opportunities for optimization may be missed by not performing the first four steps of the process of Fig. 11.

[ 0075 ] Others have tried to solve the problem, but their fundamental approach is different in that their approach is either all digital or mixed signal. Furthermore, they tend to go directly from the algorithm to hardware without searching for possible areas for optimization.

[ 0076] The analog neuromorphic hardware affords power savings and computational power that other approaches (digital, mixed signal) do not have. This affords opportunities for implementing algorithms in this analog neuromorphic that other approaches do not have. Thus, the first four steps, while manual, affords us optimizations that other approaches do not.

[ 0077 ] Scaling:

[ 0078 ] Currently, one neuron with a set number of synapses is used to perform one calculation, such as the 2-D convolution with a 3 * 3 kernel. For convolution, a neuron with 9 synapses is used, one for each element of the kernel. However, in some cases, the number of inputs to a system will outnumber the synapses of the available neuron. In order to expand on the capabilities of neuromorphic computing, calculations are performed through chains of synthetic neurons, thereby increasing the number of inputs to the system beyond the limiting number of synapses. Here a methodology is introduced to implement this. As a test case, the methodology computes a 2-D convolution of an image with a 3 x 3 kernel using only synthetic neurons with 2 synapses.

[ 0079 ] For a test case a chip with 4 neurons, each of which has 2 synapses, was used. The neurons are labeled nO, nl, n2 and n3. The inputs and synapse weights of each neuron are configured by an FPGA. Each neuron will be used for multiple calculations, and the intermediate results will be stored in the FPGA to be passed as inputs to the next cycle of calculations.

[ 0080 ] To replicate the function of a single 9 synapse neuron, a tree of 2 synapse neurons is built, as shown in Fig. 12. Adding 4 inputs requires 3 neurons. For ease of tracking inputs, one uses only 3 neurons (nO, nl, and n2) of the 4 neuron chip at a time. Each of the 4 square boxes represents one cycle of calculations. Four cycles would be needed to calculate the 2-D convolution of a single pixel of the image. The FPGA is used to configure the synthetic neurons to perform this convolution for each pixel and record those results.

[ 0081 ] For each pixel, one first determines the synapse tracking weights and inputs for each neuron based on the location of the pixel in the image and the convolution kernel. Due to edge effects, neuron inputs need to be duplicated if the filter kernel is run on edge or comer pixels. The inputs for each element of the kernel operating on a corner or edge of the image are listed below.

[ 0082 ] Fig. 13 shows the pixel numbering scheme for explaining convolution using a 3 x 3 convoluting kernel. The corner pixels are: ρο,ο = upper left; ρη,ο = upper right; po,m = lower left; and pn, m = lower right. In general, py = pixel on the ith column (1 through n) and jth row (1 through m).

[ 0083 ] For clarity, p is omitted in Fig. 13 and only the pixel indices are noted.

[ 0084 ] The side pixels for xth column will be represented as follows: ρχ,ο = top and px, m = bottom. The side pixels for the ythrow are

[ 0085 ] Figs. 14A - 14H show pixel duplications (i.e., extra pixels not part of the image) for a 3 x3 kernel for pixels at the corners and sides. The shaded pixels with dashed boxes are introduced to fill the 3 χ 3 kernel although they are not part of the image. Below each row (3 rows with 3 column values) is separated by a semicolon (;).

[ 0086]

[ 0087 ] [ 0088 ] [ 0089 ]

[ 0090 ]

[ 0091 ]

[ 0092 ]

[ 0093 ]

PlVjr+l.

[ 0094 ] Initially, all 9 inputs were added pair-wise without regards the synapse weights being inhibitory or excitatory. However, this led to cases where a large inhibitory input was added to a small excitatory input, resulting in an output of 0 as the neurons are incapable of outputting negatives. This introduced a large amount of error. This was corrected for by sorting the inputs into inhibitory or excitatory, and performed pair-wise addition on each type of input separately. For this addition, all inhibitory inputs were treated as excitatory. Only in the final step the inhibitory inputs were negated and subtraction is performed. This ensures that that large inhibitory inputs are not masked by addition with small excitatory inputs.

[ 0095 ] Each addition through a neuron also introduced a scale factor, indicated by square brackets in Eq. 9, to the output. Under the operating conditions of the chip, the scale factor equaled approximately 0.18. As the tree structure required passing an input through multiple neurons, this reduced the output magnitude by up to 104. To compensate for this, an inverse scale factor (rounded to the nearest integer) was introduced through the FPGA, and multiplied all intermediates by this factor. To ensure that all inputs are subject to the same number of scalings, the pair-wise addition tree was rebuilt such that each input passes through 4 neurons.

[ 0096] In addition, the synthetic neurons reported high spike counts than expected when given lower synapse weights. In these cases, the error caused by higher spike counts was compensated by decreasing the compensating scale factor. Given the following weight selections, the scaling factors for the output of the neuron are as follows:

• Input WSEL: 0x8080, 0x80 FF, 0xFF80 : Scale by 3

• Input WSEL: 0xC080, 0x80C0: Scale by 4

• Input WSEL: 0xE080, 0x80E0, OxCOCO: Scale by 5

• All others, scale by 6

[ 0097 ] Additionally, a bias was found in the subtraction operation as performed by the synthetic neurons. When adding an excitatory input and an inhibitory input, the neuron requires the sinking inhibitory output to be larger than the sourcing excitatory input to produce a result of 0. At the final subtraction, the excitatory input must be scaled down to get an accurate output. This was compensated for by using the variance in the performance of the 4 neurons on the chip. Neuron nl, when given an input of between 3300 and 5000 spikes in 2ms, undercounts by 15-20%. Final addition of the excitatory inputs was routed through this neuron to scale down the output in relation to the output from the addition of the inhibitory outputs. The residual bias left after this adjustment was removed in postprocessing of the image by subtracting a constant from all recorded FPGA outputs.

[ 0098 ] While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.