Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on December 21, 2021, in which claims 1-6, 9-12, and 15-17 are amended, claims 7-8, 13-14, and 18-20 are cancelled. Claims 1-6, 9-12, 15-17, and 21-27 are currently pending.

Claim Objections
Applicant's amendments made to the claims are acknowledged. Examiner’s objections to the claims are hereby withdrawn, as necessitated by Applicant’s amendments made to the claims.

Response to Arguments
The rejections to claims 6, 13, and 17 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-6, 9-12, and 15-17 U.S.C. 103(a) based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 11, 17, 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon (US 2016/0026912 A1) and in view of Huang (“LTNN: An Energy-efficient Machine Learning Accelerator on 3D CMOS-RRAM for Layer-wise Tensorized Neural Network”, 2017). 

Regarding claim 1, Falcon teaches the plurality of neural network processing circuits comprising multiplexers (MUXes) and multiply-accumulate (MAC) circuits, ([¶0088] "Calculation circuits 1118 may be implemented in any suitable manner. For example, calculation circuits 1118 may be implemented using a suitable combination of multipliers, multiplexers, delay elements, and adders." [¶0089] "Calculation circuit 1200 may be formed from reconfigurable components. Calculation circuit 1200 may include, for example, a multiply-and-accumulate (MAC) unit 1210").
with the MUXes configured to route particular synaptic weight values to particular MAC circuits in accordance with a particular MUX connectivity configuration; ([¶0107] "For example, processing device 1000 may include registers for storing weights or input values as well as multiplexers to route values to appropriate multiplication circuits"). However, Falcon does not explicitly teach and a MUX connectivity configuration circuit formed in the die and configured to determine the particular MUX connectivity configuration for different layers of a neural network. 
a die comprising non-volatile memory (NVM) elements formed in the die and arranged in a plurality of wordlines 
a plurality of neural network processing circuits formed in the die and configured to access the synaptic weight values in parallel from the word lines and perform neural network operations in parallel using the synaptic weight values  

Huang who teaches a related art of a neural network accelerator teaches a MUX connectivity configuration circuit formed in the die and configured to determine the particular MUX connectivity configuration for different layers of a neural network. ([p. 283 Sec. III C.] "The detailed design of a tensor core is also shown in Fig. 3. In each tensor core, we store different slices of the 3-dimensional matrix into different RRAM-crossbars. Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j." FIG. 3 on p. 283 shows the MUX connectivity configuration circuit with respect to a particular hidden layer.).
a die comprising non-volatile memory (NVM) elements formed in the die and arranged in a plurality of wordlines ([p. 282 Sec. III B] "The proposed 3D CMOS-RRAM accelerator is shown in Fig. 2(a). This accelerator is composed of a top layer of wordlines, a bottom layer of CMOS circuits and vertical connection between both layers by RRAM" FIG. 2 shows RRAM in an arrangement corresponding to a plurality of wordlines.  RRAM is interpreted as synonymous with non-volatile memory.  Huang explicitly teaches the configuration of elements in the circuit in the instant, therefore forming in the die would lead to an obvious and expected outcome.).
a plurality of neural network processing circuits formed in the die and configured to access the synaptic weight values in parallel from the word lines and perform neural network operations in parallel using the synaptic weight values ([p. 282 Sec. III A] "In one RRAM-crossbar, given the input probing voltage, the current on each bit-line (BL) is the multiplication-accumulation of current through each RRAM device on the BL. Therefore, the RRAM-crossbar array can intrinsically perform the analog matrix-vector multiplication [17]. Given an input voltage vector...where ci,j is configurable conductance of the RRAM resistance Ri,j , which can represent real number of weight." [p. 283 Sec. III D] "the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time" [p. 280 Sec. I] "the 3D CMOS-RRAM integration can further support more parallelism with higher I/O bandwidth in acceleration" Huang explicitly teaches that the RRAM accesses weights from the wordlines to perform multiplication and that the multiplication can be performed in a highly parallel fashion. Huang further teaches that the overall aim of the CMOS-RRAM integration circuit is to support higher parallelism through higher I/O bandwidth.). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Falcon with that of Huang. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Huang the advantages of using a stacked non-volatile memory component to access wordlines in parallel.  Huang outlines a number of benefits on [p. 283 Sec. III D] including but not limited to (“the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time”). 

Regarding claim 2, the combination of Falcon and Huang teaches The apparatus of claim 1, wherein the neural network processing circuits are configured as one or more of under-the-array circuits and next-to-the-array circuits (Huang [p. 282 Sec. III B] "The proposed 3D CMOS-RRAM accelerator is shown in Fig. 2(a). This accelerator is composed of a top layer of wordlines, a bottom layer of CMOS circuits and vertical connection between both layers by RRAM" Bottom layer of CMOS circuits is interpreted as synonymous with under-the-array circuit.). 

Regarding claim 11, claim 11 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Regarding claim 17, claim 17 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Regarding claim 21, the combination of Falcon, and Huang teaches 
The apparatus of claim 1, wherein the MUX connectivity configuration circuit is configured to select between a partial MUX connectivity and a full MUX connectivity. ( [¶0100] "Returning to FIG. 12, in one embodiment, MAC unit 1210 may output the results of convolution and dot-product operations to latches 1212, 1214. The output form may include a bit for the sign, two bits for the integer, and fourteen bits for the fractional part. This output may include partial results which may be added to other partial results from, for example, the same calculation unit 1200" [¶0089] "Furthermore, calculation circuit 1200 may include any suitable number or combination of latches to stage communication between its elements" [¶0120] "At 1430, it may be determined whether partial results, previously determined by a calculation circuit working on the same layer, are available." Multiplexer facilitating partial calculation of layer is interpreted as synonymous with partial MUX connectivity.  With respect to the instant specification a latch is multiplexed to determine full or partial connectivity configuration.). 

Regarding claim 22, the combination of Falcon and Huang teaches The apparatus of claim 1, wherein the neural network comprises N layers, and wherein the neural network processing circuits comprise M MUXes and N MACs, where M is less than N. (Falcon [¶0089] "FIG. 12 illustrates an example embodiment of a calculation circuit 1200 that may be used to implement fully or in part calculation circuit 1118, in accordance with embodiments of the present disclosure. Calculation circuit 1200 may be formed from reconfigurable components. Calculation circuit 1200 may include, for example, a multiply-and-accumulate (MAC) unit 1210" [¶0094] "In one embodiment, for a given layer, the maximum and minimum values of weights 1204 may be determined." While Falcon teaches using a multiplexer between calculation circuits it is moot with respect to the fact that M can be zero, and the calculation circuit of Falcon alone therefore teaches the claim.). 

Regarding claim 23, the combination of Falcon, and Huang teaches 
The apparatus of claim 1, wherein the MUX connectivity configuration circuit is configured to load the particular MUX connectivity configuration based on a relevant set of synaptic weights. (Huang [p. 283 Sec. III C.] "Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j" I and j of the 2D matrix are explicitly taught as being synaptic weights.). 

Regarding claim 24, Falcon teaches The method of claim 11, wherein modifying the MUX connectivity configuration comprises changing the particular MUX connectivity configuration between a partial MUX connectivity a full MUX connectivity. ([¶0100] "Returning to FIG. 12, in one embodiment, MAC unit 1210 may output the results of convolution and dot-product operations to latches 1212, 1214. The output form may include a bit for the sign, two bits for the integer, and fourteen bits for the fractional part. This output may include partial results which may be added to other partial results from, for example, the same calculation unit 1200" [¶0089] "Furthermore, calculation circuit 1200 may include any suitable number or combination of latches to stage communication between its elements" [¶0120] "At 1430, it may be determined whether partial results, previously determined by a calculation circuit working on the same layer, are available." Multiplexer facilitating partial calculation of layer is interpreted as synonymous with partial MUX connectivity.  With respect to the instant specification a latch is multiplexed to determine full or partial connectivity configuration.). 

Regarding claim 25, Huang teaches The method of claim 11, wherein the particular MUX connectivity configuration is loaded based on a relevant set of synaptic weights. ([p. 283 Sec. III C.] "Since only one 2D matrix is used at a time, two tensor core Multiplexers (MUX) are used so that only one matrix is connected to the input voltage as well as the output ADC. The TC selection module controls the input and output MUX according to i and j" i and j of the 2D matrix are explicitly taught as being synaptic weights.). 

Claims 3-6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon and Huang and in further view of Ma (US 2018/0075344 A1).

Regarding claim 3, the combination of Falcon and Huang teaches the apparatus of claim 1, however Falcon and Huang do not explicitly teach wherein the neural network processing circuits are configured to perform feedforward neural network operations in parallel using the synaptic weight values.

	Ma, who teaches a related art of a hardware accelerated method for neural networks teaches wherein the neural network processing circuits are configured to perform feedforward neural network operations ([¶0038] “Typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, such as DNN 210, CNN 230, or DBN 270") in parallel using the synaptic weight values ([¶0071] “The NN architecture with memory-centric implementations can leverage the massive parallelism and density of memory-centric design [¶0006] “the ANN and SNN utilize different techniques on how the data fed through efficiently, computation complexity, memory bandwidth considerations for neurons, synaptic weights and how to accelerate thereof").

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use feedforward operations in parallel in a neural network based architecture. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Ma [¶0038] that a feed-forward mode is a standard mode of operation in supervised machine learning.

	Regarding claim 4, the combination of Falcon, Huang, and Ma teaches The apparatus of claim 3, wherein the neural network processing circuits include one or more of: multiplication circuits configured for computing products of synaptic weight values and activation values (Falcon [¶0104] “FIG. 12, once a result is final it may be passed into activation function 1234. From there, it may be eventually passed as output 1244. If a result is not final, it may be written to storage, memory, or otherwise passed to another calculation circuit." [¶0107] “processing device 1000 may include registers for storing weights or input values as well as multiplexers to route values to appropriate multiplication circuits."); 
summation circuits configured to sum the products (Falcon [¶0081] “Algorithms implemented on standard processors such as CPU or GPU may include integer (or fixed-point) multiplication and addition, or float-point fused multiply-add (FMA). These operations involve multiplication operations of inputs with parameters and then summation of the multiplication results."); bias addition circuits configured to add a bias value to the sums (Ma "[¶0035] “Their activations can hence be computed with a matrix multiplication followed by a bias offset."); and rectified linear unit (RLU) (Ma [¶0062] “A Rectified Linear Unit (ReLU) unit can check the sign bit of the results") and/or sigmoid function circuits configured to compute RLU and/or sigmoid functions from resulting values (Ma [¶0065] “then the subtraction result can feed into the sigmoid unit").

	Regarding claim 5, the combination of Falcon and Huang teaches The apparatus of claim 1, however Falcon and Huang do not explicitly teach wherein the neural network processing circuits are configured to perform backpropagation operations in a parallel on the synaptic weight values.

Ma, who teaches a hardware accelerated method for neural networks teaches wherein the neural network processing circuits are configured to perform backpropagation operations in a parallel on the synaptic weight values ([¶0038] “Typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, such as DNN 210, CNN 230, or DBN 270, and backpropagation or backprop mode for training or learning using the labeled training datasets "[¶0071] “The NN architecture with memory-centric implementations can leverage the massive parallelism and density of memory-centric design [¶0006] “the ANN and SNN utilize different techniques on how the data fed through efficiently, computation complexity, memory bandwidth considerations for neurons, synaptic weights and how to accelerate thereof.").

	Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use backpropagation operations in parallel in a neural network based architecture. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Ma [¶0038] that a backpropagation mode is a standard mode of operation in supervised machine learning.

Regarding claim 6, the combination of Falcon, Huang, and Ma teach The apparatus of claim 5, wherein the neural network processing circuits comprise: a plurality of synaptic weight determination circuits (Falcon abstract The processor core includes logic determine a set of weights") formed in parallel and a plurality of synaptic weight update circuits (Falcon [¶0094] “weights 1204 may be scaled up to meet a defined range") formed in parallel (Huang FIG. 2 shows that the wordlines and RRAMs are both disposed in parallel.).

Regarding claim 12, the combination of Falcon and Huang teaches the method of claim 11,  wherein the neural network processing circuits are configured as one or more of under-the-array circuits and next-to-the-array circuits (Huang [p. 282 Sec. III B] "The proposed 3D CMOS-RRAM accelerator is shown in Fig. 2(a). This accelerator is composed of a top layer of wordlines, a bottom layer of CMOS circuits and vertical connection between both layers by RRAM" Bottom layer of CMOS circuits is interpreted as synonymous with under-the-array circuit.). However, the combination of Falcon and Huang does not explicitly teach wherein the neural network operations comprise feedforward operations performed in parallel on the neural network using the neural network processing components.

Ma, who teaches a related art of a hardware accelerated method for neural networks teaches wherein the neural network processing circuits are configured to perform feedforward neural network operations ([¶0038] “Typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, such as DNN 210, CNN 230, or DBN 270") in parallel using the synaptic weight values ([¶0071] “The NN architecture with memory-centric implementations can leverage the massive parallelism and density of memory-centric design [¶0006] “the ANN and SNN utilize different techniques on how the data fed through efficiently, computation complexity, memory bandwidth considerations for neurons, synaptic weights and how to accelerate thereof").

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use feedforward operations in parallel in a neural network based architecture. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Ma [¶0038] that a feed-forward mode is a standard mode of operation in supervised machine learning.

Regarding claim 15, the combination of Falcon and Huang teaches the method of claim 11,  wherein the neural network processing circuits are configured as one or more of under-the-array circuits and next-to-the-array circuits (Huang [p. 282 Sec. III B] "The proposed 3D CMOS-RRAM accelerator is shown in Fig. 2(a). This accelerator is composed of a top layer of wordlines, a bottom layer of CMOS circuits and vertical connection between both layers by RRAM" Bottom layer of CMOS circuits is interpreted as synonymous with under-the-array circuit.). However, the combination of Falcon and Huang does not explicitly teach wherein the neural network operations comprise feedforward operations performed in parallel on the neural network using the neural network processing components.

Ma, who teaches a hardware accelerated method for neural networks teaches wherein the neural network processing circuits are configured to perform backpropagation operations in a parallel on the synaptic weight values ([¶0038] “Typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, such as DNN 210, CNN 230, or DBN 270, and backpropagation or backprop mode for training or learning using the labeled training datasets "[¶0071] “The NN architecture with memory-centric implementations can leverage the massive parallelism and density of memory-centric design [¶0006] “the ANN and SNN utilize different techniques on how the data fed through efficiently, computation complexity, memory bandwidth considerations for neurons, synaptic weights and how to accelerate thereof.").

	Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use backpropagation operations in parallel in a neural network based architecture. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Ma [¶0038] that a backpropagation mode is a standard mode of operation in supervised machine learning.

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon and Huang and in further view of Yaegashi (US 20160064409 A1).

Regarding claim 9, the combination of Falcon and Huang teaches The apparatus of claim 1. However, the combination of Falcon and Huang does not explicitly teach wherein the NVM elements comprise NAND flash storage elements.

Yaegashi who teaches a related art of a stacked non-volatile memory device teaches wherein the NVM elements comprise NAND flash storage elements (Yaegashi [¶0003] “A NAND-type flash memory device is an example of a non-volatile semiconductor storage device").

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use NAND in a stacked memory configuration for a neural network accelerator in place of RRAM. The substitution would have been obvious because a person of ordinary skill in the art would be able to determine from Yaegashi [¶0003] that NAND is an example of non-volatile semiconductor storage that is known in the art.  Yaegashi further discloses means of accessing wordlines and bitlines of the stacked memory such that the substitution of the NAND device in Yaegashi with the stacked RRAM device in Huang would be obvious.

Regarding claim 10, the combination of Falcon, Huang, and Yaegashi teaches The apparatus of claim 9, wherein the synaptic weight values are storable vertically on different word lines within the NAND flash storage elements of the die (Yaegashi "[¶0022  A gate (gate electrode) of the memory cell transistor is formed of the electrode film 30 (word line WL) "[¶0003] “there has been proposed a 3D-NAND-type flash memory device where memory cells are stacked on a printed circuit board in a vertical direction.").

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Falcon and Huang and in further view of Yaegashi and in further view of Ma.

Regarding claim 16, the combination of Falcon and Huang teach The method of claim 11, 
wherein storing the neural network values within the NVM elements comprises storing a plurality of synaptic weight values vertically on separate word lines ([p. 283 Sec. III B “We need to store the matrix W in RRAM-crossbar by writing the corresponding resistance. To perform the computing, X is converted to wordline voltages and we can obtain the output current I denoting Y”).  However, the combination of Falcon and Huang does not explicitly teach within NAND elements of the die, such that feedforward multiply accumulate operations per neuron can be performed in parallel word line after word line in a block.

Yaegashi who teaches a related art of a stacked non-volatile memory device teaches within NAND elements of the die (Yaegashi [¶0003] “A NAND-type flash memory device is an example of a non-volatile semiconductor storage device" [¶0036] “ The memory cell transistors are thus disposed in a three-dimensional matrix configuration. Each memory cell transistor functions as a memory cell which stores information (data) by storing a charge in the storage layer "[¶0017] plurality of unit memory cells UC disposed in the row direction have gate electrodes thereof electrically connected with each other by the word lines "[¶0003] “memory cells are stacked on a printed circuit board in a vertical direction").  

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use NAND in a stacked memory configuration for a neural network accelerator in place of RRAM. The substitution would have been obvious because a person of ordinary skill in the art would be able to determine from Yaegashi [¶0003] that NAND is an example of non-volatile semiconductor storage that is known in the art.  Yaegashi further discloses means of accessing wordlines and bitlines of the stacked memory such that the substitution of the NAND device in Yaegashi with the stacked RRAM device in Huang would be obvious.

The combination of Falcon, Huang, and Yaegashi does not explicitly teach such that feedforward multiply accumulate operations per neuron can be performed in parallel word line after word line in a block.

Ma, who teaches a hardware accelerated method for neural networks teaches such that feedforward multiply accumulate operations per neuron can be performed in parallel word line after word line in a block ([¶0038] “Typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, such as DNN 210, CNN 230, or DBN 270 "[¶0071] “The NN architecture with memory-centric implementations can leverage the massive parallelism and density of memory-centric design [¶0006] “the ANN and SNN utilize different techniques on how the data fed through efficiently, computation complexity, memory bandwidth considerations for neurons, synaptic weights and how to accelerate thereof.").

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use feedforward operations in parallel in a neural network based architecture. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Ma [¶0038] that a feed-forward mode is a standard mode of operation in supervised machine learning.

Claims 26-27 are rejected under 35 U.S.C. 103 as being unpatentable over Huang and in view of Li (US 2019/0019564 A1). 

Regarding claim 26, Huang teaches An apparatus, comprising: a die comprising non-volatile memory (NVM) elements; (See FIG. 2.  [p. 282 Sec. III B] stacking non-volatile memories on top of microprocessors enables cost effective heterogeneous integration").
a plurality of neural network processing circuits formed in the die and configured to read synaptic weight values in parallel from a plurality of word lines of NVM elements of the die and perform neural network operations in parallel using the synaptic weight values; and a circuit formed on the die ([p. 282 Sec. III A] "In one RRAM-crossbar, given the input probing voltage, the current on each bit-line (BL) is the multiplication-accumulation of current through each RRAM device on the BL. Therefore, the RRAM-crossbar array can intrinsically perform the analog matrix-vector multiplication [17]. Given an input voltage vector...where ci,j is configurable conductance of the RRAM resistance Ri,j , which can represent real number of weight." [p. 283 Sec. III D] "the multiplication of small matrix can be performed in a highly parallel fashion on RRAM to speed-up the large neural network processing time" [p. 280 Sec. I] "the 3D CMOS-RRAM integration can further support more parallelism with higher I/O bandwidth in acceleration" Huang explicitly teaches that the RRAM accesses weights from the wordlines to perform multiplication and that the multiplication can be performed in a highly parallel fashion. Huang further teaches that the overall aim of the CMOS-RRAM integration circuit is to support higher parallelism through higher I/O bandwidth.).
and configured to perform a fold operation to: read at least some of the synaptic weight values from a plurality of first word lines of the plurality of word lines, each of the first word lines comprising single-level-cell (SLC) NVM elements, ("A two-dimensional weight is folded into three-dimensional tensor and then decomposes into tensor cores G1,G2, ...Gd" FIG. 2 shows that the explicit word lines are expressed as a single NVM layer. FIG. 2 (b) shows that the word lines represent synaptic weights.).
update the synaptic weight values read from the first word lines using at least one of the plurality of the neural network processing circuits, ([p. 281 Sec. II A] "To build a multi-layer neural network, we propose a layerwise training process based on stack auto-encoder for low rank tensor cores and high compression rate. An auto-encoder layer is to set the layer output T the same as input X and find an optimal weight to represent itself. For example, we need to train a tensorized weight W" Training the weight is interpreted as synonymous with updating the synaptic weight. Setting output T to input X is interpreted as updating the value of input X.). However, Huang does not explicitly teach and store the updated synaptic weight values in a second word line of the plurality of word lines, the second word line comprising multi-level-cell (MLC) NVM elements.  

Li teaches  and store the updated synaptic weight values in a second word line of the plurality of word lines, the second word line comprising multi-level-cell (MLC) NVM elements. ([¶0196] "the MLC NVM matrix circuit 1900 is also configured to train the resistance of the MLC NVM storage circuits MLC-R.sub.00-MLC-R.sub.mn by supporting backwards propagation of a weight update according to the following formula:"). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network accelerator in Huang with the multi-level cell NVM elements for neural network acceleration taught in Li. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that while Huang does not explicitly teach multi-level memory cells, Huang implicitly teaches storing updated weight values in stacked memories.  Li is therefore introduced to reinforce and to implicitly teach storing updated weight values in stacked memory in the scope of a neural network accelerator that is interpreted as having similar design goals as the accelerator taught in Huang.  Li further supports the combination in ([¶0242] “The system memory chip 2608 could be connected to the dedicated MLC NVM matrix circuit chip 2602 through a dedicated local bus to improve performance. The dedicated MLC NVM matrix circuit chip 2602 could also be embedded into the SoC 2606 to save power and improve performance.”).

Regarding claim 27, claim 27 effectively mirrors claim 26 and is therefore rejected under a similar interpretation.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124