Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPUTATION UNIT COMPOSED OF STACKED RESISTIVE ELEMENTS
Document Type and Number:
WIPO Patent Application WO/2018/182742
Kind Code:
A1
Abstract:
An apparatus is described. The apparatus includes a computation unit. The computation unit includes stacked, resistive elements. Each of the resistive elements is coupled to its own respective input node. Each of the resistive elements is coupled to a same common node, wherein, respective resistances of the resistive elements are to be programmed into the computation unit to establish at least one of: a computation function to be performed by the computation unit; an input operand of a computation function to be performed by the computation unit.

Inventors:
KARPOV ELIJAH V (US)
FAZIO ALBERT (US)
PILLARISETTY RAVI (US)
HAMZAOGLU FATIH (US)
Application Number:
PCT/US2017/025615
Publication Date:
October 04, 2018
Filing Date:
March 31, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTEL CORP (US)
KARPOV ELIJAH V (US)
FAZIO ALBERT (US)
PILLARISETTY RAVI (US)
HAMZAOGLU FATIH (US)
International Classes:
G11C13/00; H01L45/00
Foreign References:
US20100325186A12010-12-23
US20150162382A12015-06-11
US20150255509A12015-09-10
US20110096588A12011-04-28
US7702105B12010-04-20
Attorney, Agent or Firm:
BRASK, Justin, K. et al. (US)
Download PDF:
Claims:
Claims

I. An apparatus, comprising:

a computation unit comprising stacked, resistive elements each of the resistive elements being coupled to its own respective input node, each of the resistive elements being coupled to a same common node, wherein respective resistances of the resistive elements are to be programmed into the computation unit to establish at least one of:

a computation function to be performed by the computation unit;

an input operand of a computation function to be performed by the computation unit. 2. The apparatus of claim 1 wherein the stacked, resistive elements are vertically stacked.

3. The apparatus of claim 2 wherein the stacked, resistive elements are each in the shape of an annulus.

4. The apparatus of claim 3 wherein the common node resides within an opening of the annulus.

5. The apparatus of claim 1 wherein the stacked, resistive elements are laterally stacked.

6. The apparatus of claim 5 wherein the stacked, resistive elements reside on a different plane than the common node.

7. The apparatus of claim 5 wherein the stacked, resistive elements reside alongside the common node along a same plane.

8. The apparatus of claim 1 wherein the stacked resistive elements are each comprised of a material that is able to demonstrate finite and variable resistances, the material located between a first electrode and a second electrode.

9. The apparatus of claim 8 wherein the material comprises any of:

an oxide;

chalcogenide.

10. A method, comprising:

constructing a computation unit by:

monolithic ally stacking variable resistance elements above a semiconductor substrate; patterning wires in the metal interconnect layering above the semiconductor substrate to couple each resistance element to its own input node;

coupling each of the resistance elements to a same common node.

II. The method of claim 10 wherein the common node runs orthogonal to the wires.

12. The method of claim 10 wherein the stacking comprises stacking the variable resistance elements above one another.

13. The method of claim 10 wherein the stacking comprises stacking the variable resistance elements along a same lateral plane.

14. The method of claim 13 wherein the stacking comprises stacking the variable resistance elements along a same lateral plane as the common node.

15. The method of claim 10 wherein the stacked resistive elements are each comprised of a material that is able to demonstrate finite and variable resistances, the material located between a first electrode and a second electrode.

16. A computing system, comprising:

a plurality of processing cores;

a system memory;

a memory controller coupled between the system memory and the plurality of processing cores;

a computation unit, the computation unit comprising stacked, resistive elements each of the resistive elements being coupled to its own respective input node, each of the resistive elements being coupled to a same common node, wherein respective resistances of the resistive elements are to be programmed into the computation unit to establish at least one of:

a computation function to be performed by the computation unit;

an input operand of a computation function to be performed by the computation unit.

17. The computing system of claim 16 further comprising a second computation unit, the second computation unit comprising a same structure as the computation unit, the second computation to perform a different computation function than the computation unit.

18. The computing system of claim 17 wherein a particular one of the computation units is selected by decoding an instruction.

19. The computing system of claim 18 wherein input operands of the instruction are converted into signals provided to the respective input nodes of the resistive elements of the selected one of the computation units.

20. The computing system of claim 16 wherein the computation function is one of the following:

ADD;

MUL;

DIV;

MADD.

Description:
COMPUTATION UNIT COMPOSED OF STACKED RESISTIVE ELEMENTS Field of Invention

The field of invention pertains generally to structures for novel computational machines, and, more specifically, to a computation unit composed of stacked resistive elements

Background

Nanometer scale manufacturing has ushered in a wave of unconventional manufactured structures, such as the storage cells of emerging three dimensional crosspoint non- volatile memory devices. Potential uses of such structures beyond data storage are also being explored.

Figures

A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:

Fig. 1 shows a computation unit;

Fig. 2a, 2b and 2c show different physical implementations of a computation unit;

Figs. 3a, 3b, 3c, 3d, 3e, 3f and 3g pertain to a first method of manufacturing a computation unit;

Figs. 4a, 4b, 4c, 4d, 4e and 4f pertain to a second method of manufacturing a computation unit;

Figs. 5a, 5b, 5c and 5d pertain to a third method manufacturing a computation unit;

Fig. 6 shows an embodiment of a processor constructed with a plurality of computation units;

Fig. 7 shows an embodiment of a matrix multiply circuit constructed with a plurality of computation units;

Fig. 8 shows an embodiment of a computing system.

Detailed Description

Fig. 1 shows a unit cell for a linear (or analog) computation unit 100. The computation unit 100 may also process discrete input signals where, e.g., each discrete level is interpreted linearly (e.g., the computation unit can process ten discrete input signal levels where 0.1 V = 1, 0.2 V = 2, 0.3 V = 3 . . . 1.0 V = 1.0). As observed in Fig. 1, the computation unit 100 includes a parallel arrangement of variable resistance elements Rl, R2, R3, . . . RN. In various embodiments, as described in more detail further below, the variable resistance elements are formed as electrically isolated laterally or vertically stacked variable or settable resistive memory cells such as various Metal-Insulator-Metal (MIM) structures where the insulator can be set to various finite resistances, or, variable or settable conductive cell structures where the conductive cell material can be set to various finite resistances (=l/conductivity).

The programming of the resistive elements to specific resistance values essentially programs the computation unit 100 to perform a specific type of computation and/or programs specific input operands into the computation unit 100. As such, although the variable resistive elements may be physically composed of the same or similar kinds of structure used to form the basic storage cell of, e.g., an emerging non volatile random access memory (e.g., a resistive random access memory (RRAM), a ferroelectric random access memory (FeRAM), a phase change material (PCM) random access memory, a programmable metallization storage cell

(PMC) random access memory, a conductive bridging random access memory (CBRAM) cell, or a conductive oxide random access memory (CoRAM)), the circuit 100 of Fig. 1 is not so much a traditional memory cell as it is a programmable computation unit.

An output signal I_OUT can be defined as the total sum of all currents that flow through the variable resistance elements Rl through RN in response to their respective input voltages VI through VN. That is: Here, each variable resistance Ri can be set to a specific resistance value that scales linearly with a set current I_SET that is driven through each resistive element individually. That is, each of the variable resistance elements Rl through RN can be set to a specific, unique resistance by applying a specific, dedicated I_SET current through each resistive element individually (e.g., the first variable resistance element Rl can be set to a first specific resistance by driving a first I_SET current through the resistive element, the second variable resistance element R2 can be set to a second, different specific resistance by applying a second I_SET current through the resistive element, etc.).

Here, in general, the resistive element demonstrates a resulting resistance that is a function of the applied I_SET current. For example, in the case of a linear R vs. I_SET relationship, the resistive element's resulting resistance varies linearly with an applied I_SET current. Again, various, e.g., MIM structures demonstrate an appropriately repeatable relationship between resulting resistance and I_SET current to be used as a resistance element in a computation unit as described herein. The set current for each resistive element is sourced by current source circuitry through the element's corresponding input voltage node (e.g., the set current for resistive element Rl is sourced from the VI node).

In various embodiments the setting of the variable resistances Rl through RN to specific resistance values essentially programs the computation unit to perform a specific computation and/or programs specific input operands into the computation unit 100. That is, comparing the computation unit 100 to an execution unit within a traditional processor, the setting of specific resistance values for the variable resistances Rl through RN corresponds to the presentation of an opcode and/or input operand(s) to the execution unit. Input operands can also be provided to the computation unit by applying specific voltages VI through VN to the variable resistances individually.

At least ADD, MUL, DIV and MADD operations can be performed with the computation unit 100. Here, referring to EQN. 1, note that an ADD (addition) operation can be programmed into the computation unit if all the resistance values are set to a value of unity (1). That is, if all Ri are set equal to unity, EQN. 1 reduces to

Here, Eqn. 2a corresponds to a linear ADD instruction that accepts up to N linear input operands VI through VN. That is, after initially programming all Ri = 1 to effectively program an ADD instruction into the computation unit 100, the computation unit 100 can linearly add

subsequently provided input voltages VI through VN that are presented at the VI through VN inputs, respectively. Additionally, less than N input operands can still be processed by setting unused ones of the VI through VN inputs equal to 0. According to Eqn. 2a, if negatively biased input signals are applied at any of the VI through VN input nodes, the computation unit 100 will properly subtract those terms. Thus the ADD instruction is indifferent to polarity and is capable of adding positive input terms while subtracting negative input terms in a single computation.

The computation unit 100 can also be used to perform a scalar MUL (multiply) operation by: 1) setting I_SET to program the inverse of a selected one of the resistances equal to a multiplier value (e.g., (1/Rl) = multiplier); 2) applying an input voltage to the selected resistance that is equal to a multiplicand value (e.g., VI = multiplicand); and, 3) setting all other input voltages = 0 (e.g., each of V2 through VN are set = 0).

If the computation unit is treated as described just above, Eqn. 1 reduces to I_OUT = (multiplier) x (multiplicand) which corresponds to a scalar multiplication operation. Similarly, the computation unit 100 can be used to perform a scalar DIV (divide) operation by: 1) setting I_SET to program a selected one of the resistances equal to a divisor value (e.g., Rl = divisor); 2) applying an input voltage to the selected resistance that is equal to a numerator value (e.g., VI = numerator); and, 3) setting all other input voltages = 0 (e.g., each of V2 through VN are set = 0). If the computation unit is treated as described just above, Eqn. 1 reduces to I_OUT = (numerator)/(divisor) which corresponds to a scalar divide operation. As seen in the aforementioned MUL and DIV operations, a number of input values are set equal to 0 in order to effect scalar operations. Owing to the parallel arrangement of the variable resistance values, however, the computation unit 100 actually behaves more naturally as a vector-like machine.

For example, the computation unit 100 readily performs an N dimensional MADD

(multiply- add) by: 1) programming the inverse of each of the N resistances equal to a respective multiplier value (e.g., (1/Ri) = multiplier^); and, 2) applying a respective input voltage to each of the resistances that is equal to a respective multiplicand value (e.g., Vi = multiplicand^). If the computation unit 100 is treated as described just above, Eqn. 1 reduces to: which corresponds to a N wide MADD instruction. Again, less than N terms may be processed by setting the input node for any unwanted terms equal to zero.

Figs. 2a, 2b and 2c show exemplary physical device structures that may be fabricated on a semiconductor chip to realize the computation unit in an extremely small form factor. Here, as will be made more clear in the following discussions, the computation unit lends itself to manufacturability by emerging non volatile memory manufacturing technologies such as a three- dimensional cross point non volatile memory manufacturing processes or three dimensional stacked FLASH memory manufacturing processes.

Fig. 2a shows a first embodiment of a computation unit 200 whose parallel variable resistances are vertically stacked upon one another, e.g., akin to a three dimensional stacked FLASH technology. Figs. 2b and 2c show second and third embodiments of a computation unit 200 whose parallel variable resistances are positioned laterally with respect to one another, e.g., akin to a three dimensional cross point technology.

As observed in the embodiment of Fig. 2a, the vertically stacked computation unit 200 may include a central core or column 201 as its common node (the node tied to ground in Fig. 1). Surrounding the common core 201 at various positions along the height of the common core 201 is a resistive element 202 which includes, e.g., the insulator for a MIM structure and outer surrounding electrode 203. For simplicity, only the top variable resistive element is labeled in Fig. 2a.

Individual resistive cells are separated/isolated from one another by a corresponding dielectric region 204. That is, although the common core 201 runs continuously through the height of the stack, by contrast, the resistive element material(s) 202 and outer electrode material(s) 203 do not run continuously along the height of the stack. Here, layers of isolation 204 are periodically imposed into the stack structure so as to form discrete, isolated resistive elements along the stack height. Each of the outer electrodes make contact to their

corresponding input voltage node along, e.g., a wire formed along an interconnect plane that resides at the same vertical level above the semiconductor substrate as the corresponding resistive element.

For instance, the VI node may be formed as a wire that runs along an upper interconnect metal layer and terminates where it makes contact to the outer electrode 203 of the upper (Rl) resistive element. Similarly, the V2 node may be formed as a second wire that runs along an interconnect metal layer that resides beneath the Rl wire's layer. The V2 wire also terminates where it makes contact to the outer electrode of it corresponding R2 resistive element.

Fig. 2b shows a computation unit whose resistive elements are disposed laterally within the semiconductor chip interconnect metal and dielectric layering. Here, a full length wire 211 corresponds to the common node and runs the full length of the computation unit. Discrete resistive elements, including the first (Rl) resistive element 212, are periodically disposed on the common node wire 211 along the run length of the common node wire 211. The individual resistive elements are electrically isolated from one another by "plugs" of dielectric material. For ease of drawing the plugs are not specifically depicted. However the region 213 where the dielectric plug that separates the Rl and R2 resistive elements is shown in Fig. 2b. As depicted in Fig. 2b, the respective voltage node for each resistive element may be formed with a respective wire that runs orthogonally to the common wire 211, where, each such wire runs along a same interconnect metal plane. For ease of drawing Fig. 2b does not show vias that may drop from each such wire to a top electrode of each resistive element nor does Fig. 2b depict the top electrode for each resistive element.

Fig. 2c shows a second laterally stacked computation unit embodiment. As observed in

Fig. 2c, the resistive elements are formed on the side of the common node wire 221 (rather than on top of the common node wire 221 as depicted in Fig. 2b). Here, the individual voltage nodes may contact their respective resistive elements from top side or bottom side vertically oriented vias (Fig. 2c shows top side vertically oriented vias). As depicted in Fig. 2c, the vias may emanate from wires that run orthogonal to the common node wire 221 (Fig. 2c shows the vias dropping down vertically from wires that run orthogonal to and above the common node wire 221).

Figs. 3a through 3g depict an embodiment for manufacturing a computation unit composed of vertically stacked resistive elements, e.g., as depicted in Fig. 2a. Referring to Fig. 3a, a multilayer stack 300 of alternating conductive (M) and dielectric (D) layers is initially formed on a semiconductor substrate (for ease of drawing the semiconductor substrate is not shown). In various embodiments, the conductive layers are composed of a metal, metal alloy or degeneratively doped polysilicon silicon. The conducting and dielectric layers may be formed, e.g., by chemical vapor deposition (CVD).

As observed in Fig. 3b, vias 301 are etched into the stack 300. The vias may be formed, e.g., by covering the stack with resist, patterning the resist through lithographic techniques that expose holes in the resist where the vias 301 are to be formed. Then, an anisotropic reactive ion etch or plasma etch having a larger vertical etch rate than horizontal etch rate is performed to physically create the vias 300 in the stack. Notably, the etch reacts with both the conductive and dielectric materials in the stack 300 in order to remove both layers while forming the vias.

After formation of the vias 301, as observed in Fig. 3c, a selective isotropic etch (e.g., a wet chemical etch) is performed that is selective to the conductive layers. As such, the conductive layering is recessed from the via walls thereby forming gaps 301 in the via walls.

Referring to Fig. 3d, after the selective etch of the conductive layers is performed, an outer electrode layer 303 for the resistance element(s) is formed on the via sidewalls through, e.g., atomic layer deposition. The outer electrode layer may be composed of any of various suitable resistive element electrode metals. Referring briefly back to Fig. 2a, note that the outer electrode layer 303 will eventually form the outer electrode for each of the vertically stacked resistive elements (outer electrode 203 of the Rl resistive element is explicitly labeled in Fig. 2a).

As observed in Fig. 3e, after the formation of the outer electrode 303, the resistive element material(s) 304 are formed over the outer electrode 303 inner wall surface of the vias. In various embodiments, the resistive element material(s) 304 are formed with a lateral thickness (measured along the x axis) that is greater than the depth of the gaps 301 (less the outer electrode 303 thickness). As such, a lining of resistive element material(s) 304 is formed over the inner via walls (not shown in Fig. 3e for illustrative ease). With increased thickness, the resistive element via wall lining becomes more smooth along the via's inner walls (more like a uniform inner wall coating), while, with lesser thickness, the resistive element material(s) formed on the dielectric layers may jut out more into the via cavity than from the conductive layers (e.g., which primarily fill the gaps).

Regardless, in various embodiments, after the resistive element material(s) 304 are formed and line the inner via walls, an anisotropic etch that etches the material(s) more vertically than laterally is performed so as to substantially remove any resistive element material(s) formed on the dielectric layers of the inner via walls but not remove resistive element material(s) 304 formed in the gaps of the inner via walls. The resulting structure is depicted in Fig. 3e. Here, note that the resistive element material(s) 304 correspond to the resistive element materials of the vertical cell stack observed in Fig. 2a (where the resistive element material(s) 202 of the Rl resistive element is explicitly labeled in Fig. 2a).

Referring to Fig. 3f, what remains of the via openings is filled with inner electrode 305 material. Here, the inner electrode material 305 may be any conductive material sufficient for forming a resistive element electrode. Note that the inner electrode 305 corresponds to the inner electrode, or common node 201, of the resistive element stack of Fig. 2a. Here, comparing Fig. 2a with Fig. 3f, note that the conductive layers of the stack in Fig. 3f correspond to the individual input voltage wires (e.g., the VI wire for the Rl element in Fig. 2a) that respectively make contact to the outer electrode 203/303 of each resistive element and the dielectric layers of the stack correspond to the dielectric that is imposed between neighboring resistive elements on the stack (e.g., dielectric 204 of Fig. 2a).

In an embodiment in which the resistive elements resemble RRAM storage cells, the outer electrode 303 is composed of Titanium Nitride (e.g., TiN), the resistive element material(s) 304 are composed of a first layer that is in contact with and formed on the outer electrode 303 and a second layer that is contact with and formed on the first layer, where, the first layer is composed of a stoichiometric oxide layer (composed of, e.g., a Hafnium Oxide (Hf02), a Hafnium Tantalum Oxide (HfTaOx), a Hafnium Aluminum Oxide (HfAlOx), a Tantalum Oxide (Ta205) or other combinations of transition metal oxides) and the second layer is an oxygen exchange layer (OEL) (composed, e.g., of Tantalum (Ta), Titanium (Ti), or Hafnium (Hf)). The common node 305 may also be composed of TiN. In various embodiments, the stoichiometric oxide layer may be deposited by atomic layer deposition (ALD). The outer electrode 303 and common node 305 can be deposited by ALD or sputtering (DC or reactive). In various embodiments, the thickness of the outer electrode 303 and the first and second layer of the resistive element are all within a range of 2 -30 nm.

In another embodiment where the resistive elements resemble FeRAM storage cells, the outer electrode 303 is composed of an inert electrode such as TiN or TaN or Ru, the resistive element material 304 is composed of doped hafnium oxide (HfO) (where the dopant induces the resistive element material 304 to behave in a ferroelectric manner) and the common node 305 is composed of TiN, TaN or Ru. In an embodiment, after formation of the outer electrode 303, a first "outer" inert barrier layer is formed on the outer electrode 303 and the resistive element material 304 is formed on the outer inert barrier layer. Then, another "outer" inert barrier layer is formed on the resistive element material 304 and the common node 305 is formed on the outer inert barrier layer. In various embodiments each of the outer electrode 305, outer inert barrier layer, resistive element material and inner inert barrier may be formed by atomic layer deposition. The common node 305 may be formed by atomic layer deposition or other kinds of deposition (e.g., sputtered (DC or reactive), chemical vapor, etc.). In various embodiments, the thickness of the outer electrode 303, the barrier layers and the resistive element are all within a range of 2 -30 nm.

In another embodiment the resistive elements 304 are composed of chalcogenide or otherwise resemble a phase change storage cell which include phase change material such as Ge2Sb2Te5. In another embodiment the resistive elements 304 resemble a programmable metallization storage cell (PMC) or a conductive bridging random access memory (CBRAM) cell. In another embodiment the resistive elements 304 resemble conductive oxide random access memory (CoRAM) storage cells which include conductive oxide such as ITO (indium tin oxide).

Referring to Fig. 3g, Fig. 3g shows a 3D NAND FLASH like staircase structure for interconnecting to the computation units. Here, e.g., computation units 307 of Fig. 3g correspond to the pair of vertically stacked computation units of Fig. 3f. As observed in Fig. 3g, after the structure of Fig. 3g is formed, respective common node wires 308 are run over and make contact to the inner electrodes 305. As depicted in Fig. 3g, the common node wires run laterally along the z axis.

The stack is also formed into a staircase structure to provide connectivity to the individual resistive elements. Here, each higher conductive layer and corresponding inter layer dielectric is etched through closer to the resistive element stack than its lower

conductive/dielectric layers. By so doing, lands are formed at the staircase edges for vias to drop down and make contact to each individual voltage input wire that the stack's conductive layers correspond to. For ease of drawing, Fig. 3g has only labeled one such via 309 which makes contact to the input voltage wire for the resistive element on the bottom of the stack. The vias run to higher level wiring above layer 310 and individually connect to individual wires that provide the individual voltage inputs for the individual resistive elements. The number of vias that can be packed along the z axis helps to determine how many computation units can be packed along the z axis at least.

Note that in a true cross-point approach in which only the vertically stacked resistive elements that are in contact with a same common node wire (e.g., beneath one of common node wires 308 in Fig. 3g) can be activated (e.g., by grounding the one common node wire while placing a higher voltage on the other common node wires), individual input voltage wires are permitted to couple to more than one resistive element along a same plane. Fig. 3g shows this particular approach wherein each conductive layer is in contact with both resistive elements that reside at its particular plane (e.g., the Ml conductive layer is in contact with the bottom resistive element of both computation unit stacks).

In alternate embodiments in which a true cross-point architecture is not implemented, each conductive layer may be processed/etched to isolate computation stacks and corresponding conductive layers along a same plane. For example, referring to Fig. 3g, each of the conductive planes could be etched along plane 306 to isolate the left conductive layers and computation unit from the right side conductive layers and computation unit.

Finally, note that the overall structure of Fig. 3g can be repeated either or both laterally or vertically to produce high densities of such computation units on a same semiconductor die.

Figs. 4a through 4f show another embodiment of a manufacturing process for manufacturing a computation unit in accordance with the structural embodiment of Fig. 2b. Referring to Fig. 4a, initially, a layer of dielectric is disposed on a semiconductor substrate (for ease of drawing, the semiconductor substrate is not shown). Referring to Fig. 4b, an intra-layer dielectric is formed on the first dielectric and a common node bar 401 for a computation unit is formed in the intra-layer dielectric at a first conductive layer (Ml). The common node bar 401 may be formed by patterning the intra-layer dielectric with photoresist so as to expose an upper surface area of the intra-layer dielectric where the bar is to be formed, etching the exposed area to form an opening in the intra-layer dielectric for the bar, and depositing, e.g., a conductive material (e.g., a metal, a metal alloy, degeneratively doped poly-silicon, etc.) in the opening.

Referring to Fig. 4c, a second layer of dielectric D2 is formed over the structure of Fig. 4b. Referring to Fig. 4c, the second layer of dielectric D2 is patterned with photoresist and openings are formed therein. Resistive elements 402 are then formed in the openings. The resistive elements 402 may be composed of any of the resistive element structures 304 discussed above with respect to Fig. 3e except that they are formed vertically in Fig. 4c rather than laterally as in Fig. 3e. Referring to Fig. 4d, a next intra-layer dielectric 403 is formed over the resistive elements. Then, as observed in Fig. 4e, the individual input voltage wires 404 are formed in the next intra-layer of dielectric 403. Here, each of the individual input voltage wires 404 run laterally along the z axis. As such, only their respective cross sections are observable in Fig. 4f.

As with the computation unit structure of Fig. 3g, the computation unit of the overall structure of Fig. 4f can be repeated either or both laterally or vertically to produce high densities of such computation units on a same semiconductor die. Figs. 5a through 5d, show another manufacturing embodiment for forming a computation unit as described above with respect to Fig. 2c. Comparing the coordinate axis of Fig. 5 a with the coordinate axis of any of Figs. 4a through 4f, note that the perspective view of Fig. 5a is oriented 90° with respect to the perspective view of Figs. 4a through 4f. As such, unlike Fig. 4a which shows a side view of a single common node bar 401, by contrast, Fig. 5a shows a cross sectional view of multiple common node bars 501 formed in an intra-layer dielectric of a first conductive layer (Ml) according to, e.g., patterning with photoresist according to

photolithographic techniques.

Referring to Fig. 5b, the intra-layer dielectric is again coated with photoresist and patterned to form openings in the intra-layer dielectric where resistive elements 502 are formed. The resistive elements 502 are each abutted next to a corresponding common node bar to place the resistive elements in contact with their respective common node bar. Again, owing to the perspective of Fig. 5b, only a cross section of a single resistive element is depicted abutted next to a common node cross section. Each resistive element cross section that is depicted in Fig. 5b, however, includes a corresponding number of resistive elements "behind it" that are in contact with the same corresponding common node bar. That is, as depicted in Fig. 5b, a number of resistive elements are formed behind each resistive element cross section along the x axis. Fig. 4d could be understood to also depict a representation of one such series of resistive elements (although Fig. 4d shows them along the D2 level rather than the Ml level).

Referring to Fig. 5c a second inter-layer dielectric D2 is formed over the structure of Fig.

5b. Referring to Fig. 5d, a first voltage input wire 503 is formed at a next conductive layer which makes contact to the first resistive element 504. Here, again, a number of such wires reside "behind" wire 503 and respectively make contact to the resistive elements that are behind the resistive element that first resistive element 504 makes contact to.

If each individual resistive element is to be coupled to its own respective wire, another wire could be formed at a higher metal layer that makes contact to the middle resistive element of Fig. 5c, etc. By contrast, in a true cross-point approach in which only the resistive elements that are in contact with a same common node wire (e.g., abutted next to a common node wire) can be activated (e.g., by grounding the one common node wire while placing a higher voltage on the other common node wires), individual input voltage wires are permitted to couple to more than one resistive element along a same plane. As such, the individual wire 503 of Fig. 5d could run across the entire M2 plane and make contact to each of the depicted resistive element cross sections with corresponding vias. Although not visible, a similar approach could be undertaken with respect to the embodiment of Fig. 4f described above. Fig. 6 shows an embodiment for a processor 601 composed of a plurality of computation units 604, 605, 606, 607. Here, each of the computation units may be programmed (e.g., during a configuration programming stage) to set the individual resistive elements to a specific kind of computation.

For example, during a configuration programming stage: 1) the resistive elements of computation unit 604 may be programmed to perform an ADD; 2) the resistive elements of computation unit 605 may be programmed to perform a linear MUL; 3) the resistive elements of computation unit 605 may be programmed to perform a linear DIV; and, 4) the resistive elements of computation unit 606 may be programmed to perform a MADD.

Each computation unit 604, 605, 606, 607 is architecturally positioned along its own row

(computation unit 604 is positioned along row 1 ; computation unit 605 is positioned along row 2, computation unit 606 is positioned along row 3, computation unit 607 is positioned along row 4) where each row corresponds to a particular computation unit' s common node. By contrast, the individual input voltage wires to the computation units are shared by the computation units along columns that are orthogonal to the rows. Here, circuitry 602 corresponds to the row select circuits for the individual rows, whereas, circuitry 603 corresponds to the individual voltage inputs for the individual columns.

When programming a particular computation unit, its row select is activated (e.g., by being driven to a first potential such as ground (GND)) by its corresponding row select circuit, whereas the row select for the other computation units are driven to a second potential such as VCC by their corresponding row select circuits. For example, if the first computation unit 604 is being programmed, the rl row select is set to GND and the other row selects r2, r3 and r4 are set to VCC. A corresponding I_SET current is then driven through the individual voltage input nodes to set the corresponding resistive elements of computation unit 604.

Here, if I_SET is provided with an elevated voltage being applied at a corresponding column input, with the common node of computation unit 604 being set to GND and the other computation units 605, 606, 607 having their common nodes set to a higher VCC voltage, the I_SET current will only flow through computation unit 604 and not any of computation units 605, 606, 607. As such, only the resistive elements of computation unit 604 are precisely controlled to perform a particular computation. The same can also be applied to each of computation units 605, 606 and 607 in succession so that each is programmed to perform a specific computation.

After the processor 601 is programmed, it may receive a stream of instructions to process, where the instructions identify a particular operation to be performed (e.g., ADD, MUL, DIV, MADD) and input operands to be processed by the operation. Upon executing a particular instruction, the instruction' s opcode is decoded to activate the particular row that corresponds to the particular opcode (e.g., if the opcode corresponds to an ADD, row rl is activated). The input voltages are then applied along the columns and the selected processing unit performs its operation on the input voltages. A current meter 608 detects the output current from the computation to interpret the result.

In various embodiments, the column select inputs may be preceded by digital-to-analog (DAC) circuits that generate a specific analog voltage in response to an instruction that specifies its input voltages digitally. Likewise, an analog-to-digital (ADC) circuit may follow the current meter 608 to provide the processing output in a digital format.

It is worthwhile to point out that the voltage levels described above (e.g., GND and VCC) are only exemplary and other voltage levels may be used. Additionally, the circuit conceivably could accept individual current inputs (instead of individual voltage inputs) and interpret the processing output with a voltage meter (rather than a current meter). Further still, any inherent offsets or factors from a correct processing result could be handled with associated circuitry that compensates for the offset or factor. For example, recalling that Eqn. 1 reduces to an ADD if each Ri is set equal to 1, it may not be possible to actually set each resistive element equal to, e.g., 1 ohm. Instead, for instance, each resistive element may be set to 10 kOhms and a voltage amplifier is coupled between the computation unit common node and true GND having an amplification of 10k. The voltage amplifier essentially normalizes the output "as if each resistance element was set to 1 Ohm. The circuitry for the voltage amplifier may be, e.g., disposed in the semiconductor substrate beneath the computation unit.

Groups of computation units can also be integrated or electrically combined to perform larger more complex processing tasks. For example, consider the multiplication of two 3x3 matrices. Here, the matrix multiplication expands to nine different MADD operations. That is, for each of the three rows of a first matrix, the row's three elements are computed in: 1) a first MADD operation with three elements of the first column of the second matrix; 2) a second MADD operation with the three elements of the second column of the second matrix; and, 3) a third MADD operation with the three elements of the third column of the third matrix.

In a resultant matrix, the results of the three MADD operations for the first row of the first matrix become the three elements of a top row of the resultant matrix, the results of the three MADD operations for the second row of the first matrix become the three elements of a middle row of the resultant matrix and the three MADD operations for the third row of the first matrix become the three elements of a bottom row of the resultant matrix. Each of the nine MADD operations could be performed with nine separate computation units that perform the respective MADD computations. The nine MADD computation units could even be oriented in an array that corresponds to the resultant matrix. For example, the computation units for the top row of the resultant matrix could be aligned along a top row of a computation unit array, the computation units for the middle row could be aligned along a middle row of a computation unit array and the computation units for the bottom row could be aligned along a bottom row of a computation unit array.

Individual voltage inputs could conceivably be shared across multiple computation units. That is, note that groups of three input elements from either input matrix are processed by three computation units. Here, the input voltage wires for any such group of three elements could, e.g., be sent to and shared by the three computation units that process them. The wiring could be arranged in the metal layers of a semiconductor chip in a straightforward manner. Fig. 7 shows an illustration of such a semiconductor chip. Here, each depicted circle corresponds to a computation unit and "dotted" intersections between matrix A terms and matrix B terms within any circle define each MADD operation for that computation unit. Each computation unit may have its own associated output and current meter so the elements of the resultant matrix can be individually provided.

Fig. 8 shows a depiction of an exemplary computing system 800 such as a personal computing system (e.g., desktop or laptop) or a mobile or handheld computing system such as a tablet device or smartphone, or, a larger computing system such as a server computing system. The computing system may include a three dimensional memory having storage nodes of varying design moving down the storage cell stack as described at length above.

As observed in Fig. 8, the basic computing system may include a central processing unit 801 (which may include, e.g., a plurality of general purpose processing cores and a main memory controller disposed on an applications processor or multi-core processor), system memory 802, a display 803 (e.g., touchscreen, flat-panel), a local wired point-to-point link (e.g., USB) interface 804, various network I/O functions 805 (such as an Ethernet interface and/or cellular modem subsystem), a wireless local area network (e.g., WiFi) interface 806, a wireless point-to-point link (e.g., Bluetooth) interface 807 and a Global Positioning System interface 808, various sensors 809_1 through 809_N (e.g., one or more of a gyroscope, an accelerometer, a magnetometer, a temperature sensor, a pressure sensor, a humidity sensor, etc.), a camera 810, a battery 811, a power management control unit 812, a speaker and microphone 813 and an audio coder/decoder 814. An applications processor or multi-core processor 850 may include one or more general purpose processing cores 815 within its CPU 801, one or more graphical processing units 816, a memory management function 817 (e.g., a memory controller) and an I/O control function 818. The general purpose processing cores 815 typically execute the operating system and application software of the computing system. The graphics processing units 816 typically execute graphics intensive functions to, e.g., generate graphics information that is presented on the display 803. The memory control function 817 interfaces with the system memory 802. The system memory 802 may be a multi-level system memory.

Each of the touchscreen display 803, the communication interfaces 804 - 807, the GPS interface 808, the sensors 809, the camera 810, and the speaker/microphone codec 813, 814 all can be viewed as various forms of I/O (input and/or output) relative to the overall computing system including, where appropriate, an integrated peripheral device as well (e.g., the camera 810). Depending on implementation, various ones of these I/O components may be integrated on the applications processor/multi-core processor 850 or may be located off the die or outside the package of the applications processor/multi-core processor 850. Non volatile storage 820 may hold the BIOS and/or firmware of the computing system.

Although the CPU 801 and/or GPU 816 can conceivably make use of the computation unit structures described above because they are also circuits that perform computations that can be programmed, conceivably, any of the individual components may make use of the computation units described above as, e.g., a local processing resource. For instance, the computation units described above may be embedded in a non volatile memory (including, e.g., system memory 802 that contains non volatile memory), so that the non volatile memory itself can be perform localized processing functions (such as the execution of a matrix multiply as described above) and, by so-doing, offload the CPU 801 from having to perform the function.

Embodiments of the invention may include various processes as set forth above. The processes may be embodied in machine-executable instructions. The instructions can be used to cause a general-purpose or special-purpose processor to perform certain processes. Alternatively, these processes may be performed by specific hardware components that contain hardwired logic for performing the processes, or by any combination of programmed computer components and custom hardware components.

Elements of the present invention may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, FLASH memory, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a

communication link (e.g., a modem or network connection).

An apparatus has been described above. The apparatus includes a computation unit having stacked, resistive elements. Each of the resistive elements are coupled to its own respective input node. Each of the resistive elements are coupled to a same common node, wherein respective resistances of the resistive elements are to be programmed into the computation unit to establish at least one of:

a computation function to be performed by the computation unit; an input operand of a computation function to be performed by the computation unit.

In various embodiments the stacked, resistive elements are vertically stacked. In various embodiments the stacked, resistive elements are each in the shape of an annulus. In one or more further embodiments the common node resides within an opening of the annulus.

In various embodiments the stacked, resistive elements are laterally stacked. In further embodiments the stacked, resistive elements reside on a different plane than the common node. In other further embodiments the stacked, resistive elements reside alongside the common node along a same plane.

In various embodiments the stacked resistive elements are each comprised of a material that is able to demonstrate finite and variable resistances, where the material is located between a first electrode and a second electrode. In further embodiments the material comprises any of: an oxide;

chalcogenide.

A method has also been described that includes constructing a computation unit by: monolithic ally stacking variable resistance elements above a semiconductor substrate; patterning wires in the metal interconnect layering above the semiconductor substrate to couple each resistance element to its own input node; and coupling each of the resistance elements to a same common node.

In various embodiments the common node runs orthogonal to the wires. In various embodiments the stacking includes stacking the variable resistance elements above one another. In various embodiments the stacking includes stacking the variable resistance elements along a same lateral plane. In various further embodiments the stacking includes stacking the variable resistance elements along a same lateral plane as the common node. In various embodiments the stacked resistive elements are each comprised of a material that is able to demonstrate finite and variable resistances, the material located between a first electrode and a second electrode.

A computing system has been described. The computing system includes a plurality of processing cores, a system memory, a memory controller coupled between the system memory and the plurality of processing cores and a computation unit. The computation unit includes stacked, resistive elements where each of the resistive elements are coupled to its own respective input node. Each of the resistive elements are also coupled to a same common node, wherein respective resistances of the resistive elements are to be programmed into the computation unit to establish at least one of: a computation function to be performed by the computation unit; an input operand of a computation function to be performed by the computation unit.

In further embodiments the computing system further includes a second computation unit. The second computation unit includes a same structure as the computation unit. The second computation unit is to perform a different computation function than the computation unit. In further embodiments a particular one of the computation units is selected by decoding an instruction. In further embodiments input operands of the instruction are converted into signals provided to the respective input nodes of the resistive elements of the selected one of the computation units. In various embodiments the computation function is one of the following: ADD; MUL; DIV; MADD.

In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.