Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on August 23, 2022, in which claims 22, 26-27, and 29-30 are currently amended.  Claims 32-35 are newly added. Claims 1-6, 11-12, 17, 21, and 23-35 are currently pending

Response to Arguments
The rejections to claim 22 under 35 U.S.C. § 112(a) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-6, 11-12, 17, 21, and 23-35 under 35 U.S.C. 103 based on amendment have been considered, however, have not been deemed persuasive. 
With respect to Applicant's argument that Park does not teach at the very least that it would be obvious to implement the proposed 3D-NAND device, Examiner respectfully disagrees.  Park is directed specifically to charge-trap flash memory ([p. 420 §1] "we propose a synaptic device based on a charge-trap flash (CTF) memory. Like FG memory, it has already been commercialized in NAND flash memory") Park repeatedly draws similarities between the proposed method and existing NAND-based technology, including 3D stacked NAND ([p. 425 §IV] "The proposed array structure is similar to commercialized 3-D stacked NAND flash memory").  Park however does not mention similarities between the proposed method and existing NOR flash implementations, nor does Park explicitly teach that the CTF memory is built using NOR flash.  One of ordinary skill in the art would recognize that  the difference in the namesake between NOR and NAND flash comes from the logic gate used to implement the memory cell and that NOR flash typically outperforms NAND flash in read and write speeds.  Examiner asserts that it would be inappropriate to assume that the CTF cell in Park was meant exclusively to be implemented using NOR flash based on the mention of upper performance bounds given a particular type of flash cell. It is abundantly clear from the disclosure of Park that the proposed invention is based off existing NAND flash devices, and it would be obvious to one of ordinary skill in the art that the proposed circuit could be implemented as such (NAND).  Examiner also respectfully notes that Applicant argues on p. 14 of the submitted remarks "Generally speaking, NOR NVM devices may be operated like RAM devices, whereas NAND NVM devices cannot, and so modifying a NOR NVM device to instead operate as a NAND NVM device" and based on this argument it would seem contradictory to one of ordinary skill in the art that a 3D NAND fabrication process could be used to fabricate a NOR device but a NOR fabrication process couldn't be used to fabricate a NAND device as is implied by Applicant's argument.  Finally, absolutely nowhere in the disclosure of Park does it state or is it reasonably implied that the 3D NAND fabrication process is used to exclusively manufacture a NOR based device, and while the Examiner does not agree that the device in Park is not implemented using NAND cells, at best the mention of NOR flash in table 1 is seen as support for the obviousness of the substitution of NAND flash and NOR flash.  
With respect to Applicant's argument that Park does not teach each layer being stored in a separate NAND block, Examiner respectfully disagrees.  One of ordinary skill in the art would recognize that a NAND block is a layered subset of the cell array.  Park explicitly teaches that the 3D memory structure is a series of stacked synapse (NVM NAND or NOR cell) arrays where each synapse (cell) array corresponds to a layer of the neural network ([p. 424 §III] "The specific configuration based on the proposed synapse device is shown in Fig. 8. The synapse arrays corresponding to each layer of the DNN are stacked vertically").  A NAND based synapse array is interpreted as synonymous with a NAND block.  Applicant has not provided rationale for how the layer-wise cell array in Park is different than the layer-wise cell array in the claimed invention. For this reason the argument is seen as a general allegation of patentability. 
With respect to Applicants argument that the combination of Park and Huang would not be obvious, Examiner respectfully disagrees.  Both arts are directed towards 3D stacked memory for accelerated neuromorphic circuits and are highly analogous.  Both Park and Huang utilize memory word-lines to perform artificial neural network multiplication operations.  Examiner also notes that the use of multiply-accumulate units and multiplexers is well-known in the art and it would be obvious to one of ordinary skill before the effective filing date that said components could be used in a neuromorphic circuit.  This is simply reinforced by the disclosure of Huang. 
With respect to Applicants argument that the fold operation in Huang is somehow different than a NVM fold operation, Examiner respectfully disagrees.  One of ordinary skill ?in the art would recognize that the prior art as well as the fold operation is analogous.  This argument is seen as a general allegation of patentability.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


	Claims 1-2, 5-6, 11-12, 17, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Park (“3-D Stacked Synapse Array Based on Charge-Trap Flash Memory for Implementation of Deep Neural Networks”, 2018) and in view of Huang (“LTNN: An Energy-efficient Machine Learning Accelerator on 3D CMOS-RRAM for Layer-wise Tensorized Neural Network”, 2017). 

	Regarding claim 1, Park teaches An apparatus comprising: a die comprising non-volatile memory (NVM) elements formed in the die and arranged in a plurality of NAND blocks, each comprising a plurality of word lines; ([Abstract] "This paper proposes a synaptic device based on charge-trap flash memory...we also propose a 3-D stacked synapse array and present the structure, operation, and process methods" Flash memory interpreted as synonymous with non-volatile memory.)
	a plurality of neural network processing circuits formed in the die and configured to access synaptic weight values in parallel ([p. 420 §I] "A neuromorphic system is characterized by massively parallel architecture connecting myriad computing elements (neurons) and adaptive memory elements (synapses).  Each synapse has its own synaptic weight, which refers to the connection strength between neurons."  See also FIG. 7. [p. 425 §IV]"[p. 425] §IV] "3D-NAND process techniques can be used to fabricate the proposed 3-D stacked synapse array, such as the punch and plug process for channel formation [30] and the gate replacement process for metal gate formation [31]" synapse array interpreted as synonymous with plurality of neural network processing circuits.  Fabrication process describes die formation.  See also FIG. 10 for fabrication process showing formation in the die.)
	from the word lines of a particular NAND block ([p. 425] §IV] "The proposed array structure is similar to commercialized 3-D stacked NAND flash memory in which WLs are vertically stacked in both structures")
	and wherein each neural network layer is stored in a separate NAND block. ("The specific configuration based on the proposed synapse device is shown in Fig. 8. The synapse arrays corresponding to each layer of the DNN are stacked vertically" See FIG. 9(a) "WL connection design of 3-D stacked synapse array architecture and its selective operation method for each layer" [p. 424 §III]).
	However, Park does not explicitly teach perform neural network operations in parallel using the synaptic weight values,  the plurality of neural network processing circuits comprising multiplexers (MUXes) and multiply-accumulate (MAC) circuits,
	with the MUXes configured to route particular synaptic weight values to particular MAC circuits in accordance with a particular MUX connectivity configuration 
	and a MUX connectivity configuration circuit formed in the die and configured to determine the particular MUX connectivity configuration for different layers of a neural network. 

Huang, in the same field of endeavor, teaches and perform neural network operations in parallel using the synaptic weight values, ([p. 283 §IIID] "Secondly, a tensorization of weight matrix can decompose the big matrix into many small tensor-core matrices, which can effectively reduce the configuration time of RRAM. Lastly, the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time")
	the plurality of neural network processing circuits comprising multiplexers (MUXes) and multiply-accumulate (MAC) circuits ([p. 283 §IIIC] "The detailed design of a tensor core is also shown in Fig. 3. In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used" [p. 282§IIIA] "In one RRAM-crossbar, given the input probing voltage, the current on each bit-line (BL) is the multiplication-accumulation of current through each RRAM device on the BL" See also FIG. 3. RRAM crossbar interpreted as synonymous with multiply-accumulate circuit.)
	with the MUXes configured to route particular synaptic weight values to particular MAC circuits in accordance with a particular MUX connectivity configuration; ([p. 283 §IIIC] "In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j"  [p. 283 §IIID] "Secondly, a tensorization of weight matrix can decompose the big matrix into many small tensor-core matrices, which can effectively reduce the configuration time of RRAM. Lastly, the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time" FIG. 3 on p. 283 shows that the weights are passed through the RRAM to the multiplexers.  Huang explicitly teaches that the weights matrices are subdivided and routed through the input and output multiplexers.)
	and a MUX connectivity configuration circuit formed in the die and configured to determine the particular MUX connectivity configuration for different layers of a neural network; ([p. 283 Sec. III C. ] "The detailed design of a tensor core is also shown in Fig. 3. In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection
module controls the input and output MUX according to i and j." FIG. 3 on p. 283 shows the MUX connectivity configuration circuit with respect to a particular hidden layer.). 

Park and Huang are both directed towards a 3D stacked memory implementation of a neural network accelerator.  Therefore, Park and Huang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Park with that of Huang by using multiplexers and multiply-accumulate circuits in the accelerator.  Huang outlines a number of benefits on [p. 283 Sec. III D] including but not limited to (“the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time”). 

	Regarding claim 2, the combination of Park and Huang teaches The apparatus of claim 1, wherein the neural network processing circuits are configured as one or more of under-the-array circuits and next- to-the-array circuits. (Huang [p. 282 Sec. III B] "The proposed 3D CMOS-RRAM accelerator is shown in Fig. 2(a). This accelerator is composed of a top layer of wordlines, a bottom layer of CMOS circuits and vertical connection between both layers by RRAM" Bottom layer of CMOS circuits is interpreted as synonymous with under-the-array circuit.).  

Regarding claim 5, the combination of Park and Huang teaches The apparatus of claim 1, wherein the neural network processing circuits are configured to perform backpropagation operations in parallel on the synaptic weight values. (Huang [p. 282 §IIB] "The TNN can be further fine tuned by backward propagation on the tensor cores" [p. 284 §V] "In this paper, we propose a 3D CMOS-RRAM accelerator for highly-parallel yet energy-efficient machine learning").

	Regarding claim 6, the combination of Park and Huang teaches The apparatus of claim 5, wherein the neural network processing circuits comprise: a plurality of synaptic weight determination circuits disposed in parallel (Huang [p. 282 §IIB] "We define a tensorized neural network (TNN) if the weight of the neural network can be represented in the tensor-train data format" Tensor train interpreted as synonymous with weight determination circuit.)
	and a plurality of synaptic weight update circuits disposed in parallel (Huang [p. 282 §IIB] "Then we adjust one tensor core and fix the rest tensor cores for the minimization of ||HG1G2...Gd − W||2. Finally, we iteratively perform the
optimization of each tensor core until the error is small or the maximum iterative time reaches. The TNN can be further fine tuned by backward propagation on the tensor cores" Adjusting tensor core to minimize loss interpreted as synonymous with updating weights.). 

	Claims 11-12 are substantially similar to claim 1-2.  Therefore, the rejection applied to claims 1-2 also applies to claim 11-12.  

	Claim 17 is substantially similar to claim 1.  Therefore, the rejection applied to clam 1 also applies to claim 17.  

	Regarding claim 25, the combination of Park and Huang teaches The method of claim 11, wherein the particular MUX connectivity configuration is loaded based on a relevant set of synaptic weights. (Huang [p. 283 Sec. III C.] "Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j" I and j of the 2D matrix are explicitly taught as being synaptic weights.). 


	Claims 3 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, and Huang and in further view of Garbin (US20200151550A1).

	Regarding claim 3, the combination of Park and Huang teaches The apparatus of claim 1.
	However, the combination of Park and Huang does not explicitly teach the neural network processing circuits are configured to perform feedforward neural network operations in parallel using the synaptic weight values.  

Garbin, in the same field of endeavor, teaches the neural network processing circuits are configured to perform feedforward neural network operations in parallel using the synaptic weight values. ([¶0003] "In DNNs, data flows from the input layer to the output layer without looping back; they are feedforward networks." [¶0011] "In a neural network circuit according to embodiments of the present disclosure, the weighted current components may be provided by driving multiple transistors in parallel."). 

	Park, Huang, and Garbin are all directed towards a 3D stacked memory implementation of a neural network accelerator.  Therefore, Park, Huang, and Garbin are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Park and Huang with that of Garbin by having the neural network operate in a feedforward mode.  One of ordinary skill in the art would recognize that a feedforward neural network is the simplest form of neural network and a feedforward mode is well known.  The perceived intention of a stacked memory neural network accelerator is to increase processor throughput by increasing density.  In view of this, Garbin provides as a motivation for combination ([¶0071] "In example embodiments of the present disclosure a 3D NAND configuration provides the highest possible density option.").  

	Regarding claim 23, the combination of Park and Huang teaches The apparatus of claim 1.  
However, the combination of Park and Huang does not explicitly teach the MUX connectivity configuration circuit is configured to load the particular MUX connectivity configuration based on a relevant set of synaptic weights for each NAND block.  

Garbin, in the same field of endeavor, teaches the MUX connectivity configuration circuit is configured to load the particular MUX connectivity configuration based on a relevant set of synaptic weights for each NAND block. ([¶0072] "Also the reference signals for the reference pull-up and pull-down networks are provided as an input to the 3D NAND array 61. By the MAC operation, output signals are generated at the port OUT, which output signals are brought to the nodes of a next layer of the neural network." [¶0073] " the output signals of a particular layer may also be fed back to the input of a next layer, where these signals will act as the new input signals to be applied to this next layer. The output of the array should be stored, for example in a register 63. At the next clock cycle, the control unit 62 will provide the correct signals to the multiplexers"). 

	Park, Huang, and Garbin are all directed towards a 3D stacked memory implementation of a neural network accelerator.  Therefore, Park, Huang, and Garbin are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Park and Huang with that of Garbin by having the neural network operate in a feedforward mode.  One of ordinary skill in the art would recognize that a feedforward neural network is the simplest form of neural network and a feedforward mode is well known.  The perceived intention of a stacked memory neural network accelerator is to increase processor throughput by increasing density.  In view of this, Garbin provides as a motivation for combination ([¶0071] "In example embodiments of the present disclosure a 3D NAND configuration provides the highest possible density option.").  

	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, Huang, and Garbin and in further view of Ma (US 2018/0075344 A1).

	Regarding claim 4, the combination of Park, Huang, and Garbin teaches The apparatus of claim 3, wherein the neural network processing circuits include multiplication circuits (Garbin [¶0003] "During inference (classification) mode, input data (image, sound track, etc.) are transformed by a series of Multiply Accumulate (MAC) operations, i.e. sums weighted by the synapses values, and non-linearity functions performed by the neurons. At the output layer, the active neuron will indicate the class of the input (classification)")
	configured for computing products of synaptic weight values and activation values;  summation circuits configured to sum the products (Garbin  [¶0047] "The weighted sum is a multiply accumulate (MAC) operation. In this calculation, a set of inputs VIN,i are multiplied by a set of weights Wi,j, and those values are summed to create a final result." Set of inputs Vin,i interpreted as synonymous with activation values.). 
and rectified linear unit (RLU) and/or sigmoid function circuits configured to compute RLU and/or sigmoid functions from resulting values (Park [p. 423 §IID] "We also used a rectifier linear unit as an activation function")
	However, the combination of Park, Huang, and Garbin does not explicitly teach bias addition circuits configured to add a bias value to the sums. 

Ma, in the same field of endeavor, teaches bias addition circuits configured to add a bias value to the sums; ([¶0035] “Their activations can hence be computed with a matrix multiplication followed by a bias offset.").

	Park, Huang, Garbin, and Ma are all directed towards a memory based neural network accelerator.  Therefore, Park, Huang, Garbin, and Ma are analogous art in the same field of endeavor It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Park, Huang, and Garbin with the teachings of Ma by using adding a bias to the sum. 
A bias term is well known in the art and it would be obvious to one of ordinary skill in the art to use one.  This is further reinforced by Ma, who describes an analogous neural network accelerator and mentions as a motivation for combination with other arts regarding neural network accelerators ([¶0008] "Embodiments of the present disclosure are directed to a neural network hardware accelerator architecture and the operating method thereof capable of improving the performance and efficiency of a neural network accelerator").

	Claims 21 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, and Huang and in further view of Chang Huang (US10241837B2).

	Regarding claim 21, the combination of Park and Huang teaches The apparatus of claim 1 wherein the full MUX connectivity connects all neuron outputs of a previous layer to neurons of a next layer of the layers of the neural network, and wherein the partial MUX connectivity connects only some of the neuron outputs of the previous layer to neurons of the next layer (Huang [p. 281 §IIA] "the fully-connected layer is a special case of convolutional layer with kernel size 1 × 1, such tensorized weights can also be applied to other convolutional layers." [p. 285 §V] "3-layer neural network with two full-connected layer...hidden nodes are all fixed to 1024 with 1 hidden layers....4-layer neural network with 3 full-connected layer" Huang explicitly teaches both full and partial hidden layer connectivity and further explicitly teaches using a neural network with both fully and partially connected layers.  It would further be obvious to one of ordinary skill in the art that a layer that is not fully connected would be partially connected, and by definition would be a layer where not all neurons of a previous layer are connected to a next layer (as is shown in FIG. 1 of Huang where H(L-1) is not connected to the first node of H(L).).
	However, the combination of Park and Huang does not explicitly teach wherein the full MUX connectivity connects all neuron outputs of a previous layer to neurons of a next layer of the layers of the neural network, and wherein the partial MUX connectivity connects only some of the neuron outputs of the previous layer to neurons of the next layer., wherein the MUX connectivity configuration circuit is configured to select between a partial MUX connectivity and a full MUX connectivity.  
	However, the combination of Park and Huang does not explicitly teach the MUX connectivity configuration circuit is configured to select between a partial MUX connectivity and a full MUX connectivity.  

Chang Huang, in the same field of endeavor, teaches the MUX connectivity configuration circuit is configured to select between a partial MUX connectivity and a full MUX connectivity. ([Col. 24 l. 31-34] "the same set of calculation circuits can be used for different types of layers including convolution layer, pooling layer, upscale, ReLU or fully-connected layer. In some cases, different operations may share the same set of calculation circuits by using a multiplexer for controlling data paths or data flow in accordance with the operations."). 

	Park, Huang, and Chang Huang are all directed towards a 3D stacked memory implementation of a neural network accelerator.  Therefore, Park, Huang, and Chang Huang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Park and Huang with that of Chang Huang by using the multiplexer configuration circuit for both fully connected and partially connected neural network layers. Chang Huang teaches that this added flexibility broadens the scope of application of the neural network, and provides as further motivation for combination with other analogous arts ([Col. 7 l. 3-10] “the method and system provide an efficient data transmission between a main memory and a chip implements the parallel operations. The efficient data transmission may be achieved by dense parameter and input data packing. This data arrangement may also simplify instructions and reduce memory access. The parallel operations may include operations in a CNN layer and a smooth data pipelining or seamless dataflow between layers may be provided by data management").    

Regarding claim 24, The combination of Park and Huang teaches The method of claim 11, wherein the full MUX connectivity connects all neuron outputs of a previous layer to neurons of a next layer of the layers of the neural network, and wherein the partial MUX connectivity connects only some of the neuron outputs of the previous layer to neurons of the next layer. (Huang [p. 281 §IIA] "the fully-connected layer is a special case of convolutional layer with kernel size 1 × 1, such tensorized weights can also be applied to other convolutional layers." [p. 285 §V] "3-layer neural network with two full-connected layer...hidden nodes are all fixed to 1024 with 1 hidden layers....4-layer neural network with 3 full-connected layer" Huang explicitly teaches both full and partial hidden layer connectivity and further explicitly teaches using a neural network with both fully and partially connected layers.  It would further be obvious to one of ordinary skill in the art that a layer that is not fully connected would be partially connected, and by definition would be a layer where not all neurons of a previous layer are connected to a next layer (as is shown in FIG. 1 of Huang where H(L-1) is not connected to the first node of H(L).). 
However, the combination of Park and Huang doesn’t explicitly teach modifying the MUX connectivity configuration comprises changing the particular MUX connectivity configuration between a partial MUX connectivity a full MUX connectivity.

	Chang Huang, in the same field of endeavor, teaches modifying the MUX connectivity configuration comprises changing the particular MUX connectivity configuration between a partial MUX connectivity a full MUX connectivity. ([Col. 24 l. 31-34] "the same set of calculation circuits can be used for different types of layers including convolution layer, pooling layer, upscale, ReLU or fully-connected layer. In some cases, different operations may share the same set of calculation circuits by using a multiplexer for controlling data paths or data flow in accordance with the operations.").
	 
	Park, Huang, and Chang Huang are all directed towards a 3D stacked memory implementation of a neural network accelerator.  Therefore, Park, Huang, and Chang Huang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Park and Huang with that of Chang Huang by using the multiplexer configuration circuit for both fully connected and partially connected neural network layers. Chang Huang teaches that this added flexibility broadens the scope of application of the neural network, and provides as further motivation for combination with other analogous arts ([Col. 7 l. 3-10] “the method and system provide an efficient data transmission between a main memory and a chip implements the parallel operations. The efficient data transmission may be achieved by dense parameter and input data packing. This data arrangement may also simplify instructions and reduce memory access. The parallel operations may include operations in a CNN layer and a smooth data pipelining or seamless dataflow between layers may be provided by data management").    


	Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, and Huang and in further view of Whatmough (US20200133831A1).

	Regarding claim 22, the combination of Park and Huang teaches The apparatus of claim 1.
	However, the combination of Park and Huang does not explicitly teach the neural network comprises N layers, and wherein the neural network processing circuits comprise a different number of MACs than MUX circuits  

Whatmough, in the same field of endeavor, teaches the neural network comprises N layers, and wherein the neural network processing circuits comprise a different number of MACs than MUX circuits ([0069] "Specifically, FIG. 8, illustrates a hardware arrangement 800 to implement a transpose, shown as an IM2COL transpose (FIG. 7). As shown in FIG. 8, arrangement 800 includes one or more IFM SRAMs, shown generally as 802, a transpose module 804, a temporary register 806, crossbar 809, which could be a multiplex (mux) crossbar, and module 820, which may be a SIMD unit or un-pipelined MAC array." See also FIG. 8.  It would be obvious to one of ordinary skill in the art that an array refers to more than one unit, while Whatmough shows that only a single multiplexer is needed.). 

	Park, Huang, and Whatmough are all directed towards a memory based neural network accelerator.  Therefore, Park, Huang, and Whatmough are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Park and Huang with the teachings of Whatmough by using multiple MAC units for every multiplexer.  Whatmough teaches that a GEMM is a generic matrix multiplication operation commonly used in neural networks, and teaches that said operations can be performed through a MAC unit array ([¶0040] “A common approach to implement convolution in a CPU (central processing unit), a GPU (graphics processing unit) and dedicated hardware accelerators is to convert it into a generic matrix multiplication (GEMM) operation. “ [¶0042] “In software, the GEMM operation is performed by calling a library function. In hardware, the GEMM operation is often implemented efficiently as a 2D MAC array”).  
	
	Claims 26-28 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Huang and in view of Li (US 20190019564 A1). 

	Regarding claim 26, Huang teaches An apparatus, comprising: a die comprising non-volatile memory (NVM) elements; (See FIG. 2.  [p. 282 Sec. III B] stacking non-volatile memories on top of microprocessors  enables cost-effective heterogeneous integration")
	a plurality of neural network processing circuits formed in the die and configured to read synaptic weight values in parallel from a plurality of word lines of NVM elements of the die and perform neural network operations in parallel using the synaptic weight values; and a circuit formed on the die ([p. 282 Sec. III A] "In one RRAM-crossbar, given the input probing voltage, the current on each bit-line (BL) is the multiplication-accumulation of current through each RRAM device on the BL. Therefore, the RRAM-crossbar array can intrinsically perform the analog matrix-vector multiplication [17]. Given an input voltage vector...where ci,j is configurable conductance of the RRAM resistance Ri,j , which can represent real number of weight." [p. 283 Sec. III D] "the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time" [p. 280 Sec. I] "the 3D CMOS-RRAM integration can further support more parallelism with higher I/O bandwidth in acceleration" Huang explicitly teaches that the RRAM accesses weights from the wordlines to perform multiplication and that the multiplication can be performed in a highly parallel fashion. Huang further teaches that the overall aim of the CMOS-RRAM integration circuit is to support higher parallelism through higher I/O bandwidth.)
	 and configured to perform an on-chip NVM fold operation to: read at least some of the synaptic weight values from a plurality of first word lines of the plurality of word lines, each of the first word lines comprising single-level-cell (SLC) NVM elements of a portion of the NVM configured to operate in an SLC mode ([p. 281 §IIA] "A two-dimensional weight is folded into three-dimensional tensor and then decomposes into tensor cores G1,G2, ...Gd" FIG. 2 shows that the explicit word lines are expressed as a single NVM layer. FIG. 2 (b) shows that the word lines represent synaptic weights. Operation of single layer cells interpreted as running in single layer cell mode.)
	update the synaptic weight values read from the first word lines using at least one of the plurality of the neural network processing circuits, ([p. 281 Sec. II A] "To build a multi-layer neural network, we propose a layerwise training process based on stack auto-encoder for low rank tensor cores and high compression rate. An auto-encoder layer is to set the layer output T the same as input X and find an optimal weight to represent itself. For example, we need to train a tensorized weight W" Training the weight is interpreted as synonymous with updating the synaptic weight. Setting output T to input X is interpreted as updating the value of input X.).
	However, Huang does not explicitly teach  and store the updated synaptic weight values in a second word line of the plurality of word lines, the second word line comprising multi-level-cell (MLC) NVM elements of a portion of the NVM configured to operate in an MLC mode.  

Li, in the same field of endeavor, teaches to store the updated synaptic weight values in a second word line of the plurality of word lines, the second word line comprising multi-level-cell (MLC) NVM elements of a portion of the NVM configured to operate in an MLC mode. ([¶0196] "the MLC NVM matrix circuit 1900 is also configured to train the resistance of the MLC NVM storage circuits MLC-R.sub.00-MLC-R.sub.mn by supporting backwards propagation of a weight update according to the following formula:"). 

	Huang and Li are all directed towards a memory based neural network accelerator.  Therefore, Huang and Li are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Huang with the multi-level cell NVM elements for neural network acceleration taught in Li. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that while Huang does not explicitly teach multi-level memory cells, Huang implicitly teaches storing updated weight values in stacked memories.  Li is therefore introduced to reinforce and to implicitly teach storing updated weight values in stacked memory in the scope of a neural network accelerator that is interpreted as having similar design goals as the accelerator taught in Huang.  Li further supports the combination in ([¶0242] “The system memory chip 2608 could be connected to the dedicated MLC NVM matrix circuit chip 2602 through a dedicated local bus to improve performance. The dedicated MLC NVM matrix circuit chip 2602 could also be embedded into the SoC 2606 to save power and improve performance.”).

	Regarding claim 27, claim 27 is substantially similar to claim 26.  Therefore the rejection applied to claim 26 also applies to claim 27.

	Regarding claim 28, the combination of Huang and Li teaches The method of claim 27, further comprising: performing neural network operations in parallel using the synaptic weight values, (Huang [p. 283 §IIID] "Secondly, a tensorization of weight matrix can decompose the big matrix into many small tensor-core matrices, which can effectively reduce the configuration time of RRAM. Lastly, the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time")
	wherein the neural network operations are performed in parallel by a plurality of neural network processing components formed within the die (Huang [p. 283 §IIID] "Secondly, a tensorization of weight matrix can decompose the big matrix into many small tensor-core matrices, which can effectively reduce the configuration time of RRAM. Lastly, the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time")
	the plurality of neural network processing components comprising multiplexers (MUXes) and multiply-accumulate (MAC) components, (Huang [p. 283 §IIIC] "The detailed design of a tensor core is also shown in Fig. 3. In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used" [p. 282 §IIIA] "In one RRAM-crossbar, given the input probing voltage, the current on each bit-line (BL) is the multiplication-accumulation of current through each RRAM device on the BL" See also FIG. 3. RRAM crossbar interpreted as synonymous with multiply-accumulate circuit.)
	with the MUXes configured to route particular synaptic weight values to particular MAC circuits in accordance with a particular MUX connectivity configuration; (Huang [p. 283 §IIIC] "In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j"  [p. 283 §IIID] "Secondly, a tensorization of weight matrix can decompose the big matrix into many small tensor-core matrices, which can effectively reduce the configuration time of RRAM. Lastly, the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time" FIG. 3 on p. 283 shows that the weights are passed through the RRAM to the multiplexers.  Huang explicitly teaches that the weights matrices are subdivided and routed through the input and output multiplexers.)
	modifying the MUX connectivity configuration for a different layer of a neural network and then performing additional neural network operations; and wherein each neural network layer is stored in a separate NAND block of the die. (Huang [p. 283 Sec. III C. ] "The detailed design of a tensor core is also shown in Fig. 3. In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j." FIG. 3 on p. 283 shows the MUX connectivity configuration circuit with respect to a particular hidden layer.). 

	Regarding claim 31, claim 31 is substantially similar to claim 28.  Therefore, the rejection applied to claim 28 also applies to claim 31. 

	Claims 29 and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Park, in view of Huang and in further view of Li. 

	Regarding claim 29, Huang teaches The apparatus of claim 1, further comprising a NVM storage circuit formed in the die and (See FIG. 2.  [p. 282 Sec. III B] stacking non-volatile memories on top of microprocessors  enables cost effective heterogeneous integration")
	configured to perform an on-chip NVM fold operation by: reading at least some of the synaptic weight values from a plurality of first word lines of the plurality of word lines,  each of the first word lines comprising single-level- cell (SLC) NVM elements of a portion of the NVM configured to operate in an SLC mode ([p. 281 §IIA] "A two-dimensional weight is folded into three-dimensional tensor and then decomposes into tensor cores G1,G2, ...Gd" FIG. 2 shows that the explicit word lines are expressed as a single NVM layer. FIG. 2 (b) shows that the word lines represent synaptic weights. Operation of single layer cells interpreted as running in single layer cell mode.)
	updating the synaptic weight values read from the first word lines using at least one of the plurality of the neural network processing circuits, ([p. 281 Sec. II A] "To build a multi-layer neural network, we propose a layerwise training process based on stack auto-encoder for low rank tensor cores and high compression rate. An auto-encoder layer is to set the layer output T the same as input X and find an optimal weight to represent itself. For example, we need to train a tensorized weight W" Training the weight is interpreted as synonymous with updating the synaptic weight. Setting output T to input X is interpreted as updating the value of input X.).
	However, Huang does not explicitly teach and storing the updated synaptic weight values in a second word line of the plurality of word lines,  the second word line comprising multi-level-cell (MLC) NVM elements of a portion of the NVM configured to operate in an MLC mode.  

Li, in the same field of endeavor, teaches and storing the updated synaptic weight values in a second word line of the plurality of word lines,  the second word line comprising multi-level-cell (MLC) NVM elements of a portion of the NVM configured to operate in an MLC mode. ([¶0196] "the MLC NVM matrix circuit 1900 is also configured to train the resistance of the MLC NVM storage circuits MLC-R.sub.00-MLC-R.sub.mn by supporting backwards propagation of a weight update according to the following formula:"). 

	Park, Huang, and Li are all directed towards a memory based neural network accelerator.  Therefore, Park, Huang, and Li are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Park and Huang with the multi-level cell NVM elements for neural network acceleration taught in Li. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that while Huang does not explicitly teach multi-level memory cells, Huang implicitly teaches storing updated weight values in stacked memories.  Li is therefore introduced to reinforce and to implicitly teach storing updated weight values in stacked memory in the scope of a neural network accelerator that is interpreted as having similar design goals as the accelerator taught in Huang.  Li further supports the combination in ([¶0242] “The system memory chip 2608 could be connected to the dedicated MLC NVM matrix circuit chip 2602 through a dedicated local bus to improve performance. The dedicated MLC NVM matrix circuit chip 2602 could also be embedded into the SoC 2606 to save power and improve performance.”).

	Regarding claim 30, claim 30 is substantially similar to claim 29.  Therefore, the rejection applied to claim 29 also applies to claim 30.  

	Claims 32-33 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, and Huang and in further view of Garland (“Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing”, 2018)

	Regarding claim 32, the combination of Park and Huang teaches The apparatus of claim 1.
	However, the combination of Park and Huang does not explicitly teach at least one of the MAC circuits is configured to perform each of a MAC operation, a bias value addition operation, and at least one of a rectified linear unit (RLU) operation or a sigmoid computation operation.  

Garland, in the same field of endeavor, teaches The apparatus of claim 1, wherein at least one of the MAC circuits is configured to perform each of a MAC operation, a bias value addition operation, and at least one of a rectified linear unit (RLU) operation or a sigmoid computation operation. ([p. 31:1 §1] "We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight shared CNN" [p. 31:12 §4] "For comparison, three versions of the accelerator, a non-weight-shared, a weight-shared, and a weight-shared-with-PASM accelerator, are designed and synthesized...The three versions of the CNN accelerators are based on the AlexNet [14] CNN and accelerate one layer of the convolution to allow for implementation in an FPGA. The accelerators include stride, an activation function, ReLU, and bias"). 

	Park, Huang, and Garland are all directed towards neural network accelerators.  Therefore, Park, Huang, and Garland are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the MAC unit in the combination of Park and Huang with that in Garland.  Garland provides as a motivation for combination ([p. 31:2 §1] “We also show that PASM is beneficial when implemented in a resource-constrained FPGA as PASM consumes fewer block RAMs (BRAMs) and DSP units for the MAC operations in the FPGA.”).  

	Claim 33 is substantially similar to claim 32.  Therefore, the rejection applied to claim 32 also applies to claim 33.  

	Claims 34-35 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park and Huang and in further view of Seligson (WO1992020029A1).

	Regarding claim 34, the combination of Park and Huang teaches The apparatus of claim 1.
	However, the combination of Park and Huang does not explicitly teach the die further comprises a sense latch and an accumulator latch, and wherein the neural network operations comprise feedforward operations with a result of the feedforward operations for a first layer of the neural network stored in the sense latch and with the result of additional feedforward operations for additional layers accumulated in the accumulator latch.  

Seligson, in the same field of endeavor, teaches the die further comprises a sense latch and an accumulator latch, and wherein the neural network operations comprise feedforward operations with a result of the feedforward operations for a first layer of the neural network stored in the sense latch and with the result of additional feedforward operations for additional layers accumulated in the accumulator latch. ([p. 19 l. 10-15] "The difference neuron of Figure 4 can be incorporated in a feedback network as well as in feedforward network discussed above" [p. 15 l. 14-p. 16 l. 14] "The output, Uj, of ADC 63 is temporarily stored in latch 64 and fed, in parallel, to the first layer of multiplexed neurons, typically represented by unit 86...The output of neuron 86 is stored in latch 112...The N vector components are selectable by means of MUX 114 and select control 115 for application to the second neuron, Neuron 116 being representative of the neurons in the second layer. Neuron 116 is structure like unit 86, accepting yC) as an input vector on line 118 and w(i2) as the corresponding reference vector. Its output is placed in latch 124 for outputting either as a digital value or through DAC 126 for analog output. Clearly, additional layers could be added by repeating the structure of the second layer."). 

	Park, Huang, and Seligson are all directed towards neural network accelerators.  Therefore, Park, Huang, and Seligson are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Park and Huang with the teachings of Seligson by using a latch to store results of the first layer as well as a latch attached to the accumulator to store results of subsequent layers in a feedforward network. Seligson teaches that the proposed invention describes a difference network and provides as a motivation for combination ( [p. 12 l. 18-22] "It is important to note that the difference neuron, as described above, has a significantly improved discrimination capability by using the offset, θ , to adjust the radius of the hypersphere and by locating its center through the choice of reference weights {w-}. In addition, the "shape" of the hyperspace is adjustable by selecting the appropriate distance metric.").

	Regarding claim 35, claim 35 is substantially similar to claim 34.  Therefore, the rejection applied to claim 34 also applies to claim 35. 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126