DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/180,462, filed November 5, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed December 16, 2021 has been entered. Examiner acknowledges receipt of Amendments to Application 16/180,462, which include: Amendments to the Claims pp.2-6, Amendments to the Specification pp.7-8, and Remarks pp.9-17 (containing applicant’s amendments). 
Regarding Applicant’s Remarks on p.9, Examiner has acknowledged Claims 2, 3, 12, and 20 have been amended. Claims 1-20 remain pending in the application. 
Regarding Applicant’s Remarks on pp.9-10, Examiner has acknowledged Applicant’s Amendments to the Specification have resolved the objections identified in paragraphs [0030], [0032], [0064]-[0068], and [0090]-[0106] respectively, and therefore those respective objections previously set forth in the Non-Final Office Action mailed August 19, 2021 are withdrawn. 
Regarding Applicant’s Remarks on p.10, Examiner acknowledges applicant’s Amendments to the Claims have resolved certain indefiniteness/lack of antecedent issues identified in Claims 2, 3, and 20 (and inherited in dependent Claims 3-4 from parent Claim 2), and therefore the respective §112(b) rejections previously set forth in the Non-Final Office Action mailed August 19, 2021 for Claims 2-4 and 20 are withdrawn. However, Examiner notes that the §112(b) rejections identified in Claim 15 have not been addressed, and hence the respective §112(b) rejections previously set forth in the Non-Final Office Action mailed August 19, 2021 for Claim 15 are maintained.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/180,462, which include: Remarks pp.9-17 (containing applicant’s arguments). 
Regarding Applicant’s Remarks on pp.9-10 for the identified specification objections in paragraphs [0072] and [0082], examiner acknowledges applicant’s arguments and have considered them, and have found them to be persuasive, and therefore those respective objections previously set forth in the Non-Final Office Action mailed August 19, 2021 are withdrawn. 
Regarding Applicant’s Remarks on pp.10-17 for Claims 1-3, 5, and 8 under 35 U.S.C. 102(a)(1) as being anticipated by Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243]; for Claim 4 under 35 U.S.C. 103 as being unpatentable over Gokmen '243 in view of Chi et al., PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-based Main Memory, 2016 ACM/IEEE 43rd International Symposium on Computer Architecture, pp.27-39 [hereafter referred as Chi]; for  Claims 7, 16-17, and 20 under 35 U.S.C. 103 as being unpatentable over Gokmen '243 in view of Mern et al., Layer-wise synapse optimization for implementing neural networks on general neuromorphic architectures, arXiv:1802.06920v1, February 20 2018, 8 pages [hereafter referred as Mern]; for Claims 6 and 9-13 under 35 U.S.C. 103 as being unpatentable over Gokmen '243 in view of Gokmen et al., U.S. PGPUB 2018/0253642, filed 3/1/2017 [hereafter referred as Gokmen '642]; for Claim 14 under 35 U.S.C. 103 as being unpatentable over Gokmen '243 in view of Gokmen '642 as applied to Claim 9, in further view of Mern; for Claim 15 under 35 U.S.C. 103 as being unpatentable over Gokmen '243 in view of Gokmen '642 as applied to Claim 9, in further view of Chi; and for Claims 18 and 19 under 35 U.S.C. 103 as being unpatentable over Gokmen '243 in view of Mern as applied to Claim 16, in further view of Gokmen '642, Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Hence the existing U.S.C. 35 §102(a)(1) and U.S.C. 35 §103 rejections are still maintained, and the updated claim mappings according to the applicant’s amended claims are provided in the sections indicated below.
Regarding applicant’s Remarks on p.11-13:
“It appears to be argued, essentially, that Gokmen '243 describes backpropagation training of an ANN, and that calculation of the error signal and applying the error signal to the neuron weights anticipates " ... apply an averaging function across output values successively presented on output control lines of a last layer of the artificial neural network ... "
More specifically, it appears to be argued that the successively generated output maps of Gokmen '243 (See e.g., Fig. 5, Output Maps 530) correspond to " ... output values successively presented on output control lines of a last layer of the artificial neural network ... " It also appears to be argued that the error calculation of Gokmen '243 corresponds broadly to " ... apply an averaging function ... 
Applicant respectfully submits however that even if these correspondences were assumed, for the sake of argument only (and Applicant does not concede this), Gokmen '243 still cannot anticipate claim 1 at least for the reason that the error  calculation of Gokmen '243 is not "... appl[ied] presented ... " as in the present claim 1. 
This deficiency can be illustrated by a comparison of Fig. 5 of Gokmen '243 with Fig. 5 of the present application: 
… Gokmen ‘243 Fig. 5 …
… Present Application, Fig.5 …
Gokmen '243 does not " ... apply an averaging function across ... " the M output maps 530. Rather, an error signal is separately calculated for each of these maps, which are each separately backpropagated to adjust the trainable parameters.
In contrast, Fig. 5 of the present application " ... appl[ies] an averaging function across ... " output layers 546 - they are averaged together in Fig. 5, for example. This is entirely different from the operation of Gokmen '243, which does not anticipate, or teach this claim element.”
Examiner has considered this argument, and finds the argument to be not persuasive. Examiner notes that Applicant’s arguments are directed to the following claim limitation in independent Claim 1: “apply an averaging function across output values successively present on output control lines of a last layer of the artificial neural network from each iteration of the input value”. Examiner further notes that Applicant’s above arguments rely on a superficial comparison of Gokmen ‘243 Figure 5 with applicant’s own Figure 5, as a way to show that Gokmen ‘243 Figure 5 does not show an averaging function or error calculation across CNN layers. Applicant is reminded that the claims must be given their Gokmen ‘243 col. 11 lines 25-56 was cited to indicate this teaching: “… according to the CNN training above, the CNN learns to model a dependency between the inputs and the expected outputs in the training data. Mathematically, for a vector of input maps S and a vector of outputs X, the CNN learns a model to reduce an error E between S and X. One such error function is the mean square error between S and X, for example: 𝐄=Σ∥𝒇(𝑺(𝒕))−𝑿(𝒕)∥2. Other error functions can include, for example, cross-entropy or logistic loss. …”. According to the above citation, Gokmen ‘243 teaches an error calculation being performed between a set of input maps S and a vector of outputs X, where the error calculation utilizes a mean squared error function between inputs S and outputs X. A person having ordinary skill in the art would understand that a mean squared error calculation is a statistical calculation that measures the average of the squares of the errors (where the error is calculated between the input layer and the output layer, including the intermediate layers). Gokmen ‘243 further teaches a neuron control system with a neuron interface component containing a crossbar array and an error calculation module that compares the outputs from the neurons to the training data (a set of inputs) to determine an error signal (Gokmen ‘243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7: “… a node/neuron control system 1900 is shown … A neuron interface 1908 controls neurons on the CNN, determining whether the neurons are in feed forward mode, back propagation mode, or weight update mode. The neuron interface 1908 furthermore provides inputs to input neurons and receives the output from output neurons. An error calculation module 1910 compares the outputs from the neurons to training data 1905 to determine an error signal…”; col.20 lines 61-63: “ … the neuron control system 1900 uses a single RPU array for training multiple layers of the CNN …”). Hence, Gokmen ‘243 effectively teaches a mean squared error calculation (performed by the error calculation module) between a set of inputs and vector of outputs, where this mean squared error calculation represents an averaging function that is being applied across 
Regarding applicant’s Remarks on p.14:
“Independent claim 16 recites, for example, " ... applying a noise reduction function among logit vectors presented by an output layer of the artificial neural network ... "
" ... control circuitry ... configured to ... apply an averaging function across output values successively presented on output control lines of a last layer of the artificial neural network ... "
The Office Action cites Gokmen '243 as teaching this element on essentially the same grounds presented with respect to Claim 1, substituting " ... averaging function ... " for " ... noise reduction function" and substituting " .. .last layer ... " with " ... output layer.
Applicant respectfully submits that Gokmen '243 does not teach this element of claim 16 for at least the same reasons set forth above with respect to independent claim 1. Accordingly, Claim 16 is allowable for at least the same reasons, and Applicant thus respectfully requests withdrawal of the rejection of Claim 16.”
Examiner has considered this argument, and finds the argument to be not persuasive. Examiner notes that Applicant’s arguments are directed to the following claim limitation in independent Claim 16: “… applying a noise reduction function among logit vectors presented by an output layer of the artificial neural network …”. Examiner notes that in light of Applicant’s specification paragraph [0030], the term “logit” refers to non-normalized outputs produced by an artificial neural network: “For inference operations, a final logit vector comprising non-normalized predictions is obtained in a last layer before being fed into a softmax function to generate normalized probabilities for classification”. According to the Non-Final Office Action mailed August 19, 2021, the claim limitation “logit vectors presented by an output layer of the artificial neural network” is taught by the Mern reference (Mern p.1 col.2 2nd paragraph-p.2 col.1 1st paragraph; and p.4 col.2 2nd paragraph), where an ANN that is translated into a spiking neural network for implementation in a neuromorphic circuit produces outputs of unnormalized logits that are passed to a softmax function. With regards to the remaining aspects of the claim limitation: “… applying a noise reduction function among … vectors presented by an output layer of the artificial neural network”, Examiner indicates that under its broadest reasonable interpretation, this aspect of the claim limitation recited in independent Claim 16 is similar in scope to the claim limitation recited in independent Claim 1 that was addressed earlier. Referring to Applicant’s specification paragraph [0030], Applicant indicates that errors are a form of noise: “Thus, a reduction in forward propagation noise is desired for ANNs. … During training operations, the error from a last layer iteration will get compensated at a current layer iteration. … This final logit vector will have errors due to the accumulation of forward propagation noise from previous layers. And these errors will cause the classification accuracy to drop …”. This interpretation is also consistent with the term “noise” as defined in the Oxford Dictionary of Computing (6th edition 2008, https://www.oxfordreference.com/view/10.1093/acref/9780199234004.001.0001/acref-9780199234004), which defines noise as being “any signal that occurs in an electronic or communication system and is considered extraneous to the desired signal being propagated. Noise can be introduced, for example, by external disturbances and may be deleterious in a given system since it can produce spurious signals, i.e., errors.”. Furthermore, Applicant’s own specification paragraph [0043] additionally indicates that “a noise reduction function comprises an averaging function applied across more than one output value or more than one set of output values”, hence making Gokmen ‘243’s teaching of the error calculating module performing an “averaging function” (i.e., mean squared error to determine errors between outputs and inputs) also applicable to teach the “noise reduction function” that is recited in independent Claim 16. As indicated in the Non-Final Office Action mailed August 19, 2021, the motivation to combine both Gokmen ‘243 and Mern references is taught in Mern, as providing logit representations that normalize the outputs of an artificial neural network (ANN) provides a way to convert an ANN into an equivalent spiking neural network in order to be deployed on neuromorphic chips, such that the normalized outputs are set within a defined range of the computing and memory requirements for the neuromorphic chip, thus providing a computationally and energy-efficient way to implement an ANN on an analog neuromorphic chip with minimal loss in performance (Mern p.1 Abstract). Therefore, given the evidence shown above, the prior art argument provided by the Applicant is not persuasive, and the prior art rejection is maintained. 
Regarding Applicant’s Remarks on p.15:
Independent claim 9 recites, for example, "... applying a noise reduction function to successive output values presented at the output layer ... "
The Office Action cites Gokmen '243 as teaching this element on essentially the same grounds presented with respect to Claim 1, substituting " ... averaging function ... " for " ... noise reduction function". 
Applicant respectfully submits that Gokmen '243 does not teach this element of claim 9 for at least the same reasons set forth above with respect to independent claim 1. Accordingly, Claim 9 is allowable for at least the same reasons, and Applicant thus respectfully requests withdrawal of the rejection of Claim 16.”
Examiner has considered this argument, and finds the argument to be not persuasive. Examiner notes that Applicant’s arguments are directed to the following claim limitation in independent Claim 9: “… applying a noise reduction function to successive output values presented at the output layer …”. As established in the response to Applicant’s arguments for independent Claim 16, Applicant’s specification paragraph [0030] indicates that errors are a form of noise, which is consistent with the definition of “noise” provided in the Oxford Dictionary of Computing (6th edition, 2008). Furthermore, applicant’s own specification paragraph [0043] additionally indicates that “a noise reduction function comprises an averaging function applied across more than one output value or more than one set of output values”, hence making Gokmen ‘243’s teaching of the error calculating module performing an “averaging function” (i.e., mean squared error to determine errors between outputs and inputs) also applicable to teach the “noise reduction function” that is recited in independent Claim 9. Under its broadest reasonable interpretation in light of the specification, the above evidence presented for independent Claim 16 is also applicable for independent Claim 9, and therefore, the prior art argument provided by the Applicant is not persuasive, and the prior art rejection is maintained.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 15 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claim 15,
Claim 15 recites the limitation “wherein the node outputs are coupled to analog-to-digital conversion circuitry for introduction to further instances of the one or more intermediate layers according to at least corresponding node connections”. There is insufficient antecedent basis for the term “further instances of the one or more intermediate layers” in the claim, since there is no earlier reference to any set of “instances of the one or more intermediate layers” in Claim 15 or in independent parent Claim 9 that would indicate these instances associated with the one of more intermediate layers as being “further instances”. For the purposes of examination, this claim limitation will be interpreted as “wherein the node outputs are coupled to analog-to-digital conversion circuitry for introduction to ”.
In addition, Claim 15 further recites the term “according to at least corresponding node connections” in the limitation “wherein the node outputs are coupled to analog-to-digital conversion circuitry for introduction to according to at least corresponding node connections”, which renders the claim as being indefinite, since it is unclear whether only a fraction of corresponding node connections within the one or more intermediate layers are coupled to the analog-to-digital conversion circuitry, or whether there are other possible non-corresponding node connections within the one or more intermediate layers that are also coupled to the analog-to-digital conversion circuitry. For the purposes of examination, this claim limitation will be interpreted as “wherein the node outputs are coupled to analog-to-digital conversion circuitry for introduction to ”.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-3, 5, and 8 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243].
Regarding original Claim 1, Gokmen '243 teaches
(Original) A circuit, comprising: 
artificial neurons comprising a memory array having non-volatile memory (NVM) elements (Examiner’s note: Gokmen ‘243 teaches resistive processing units (RPUs) within a crossbar array (with the RPUs corresponding to “artificial neurons” in an artificial neural network), with each RPU implemented with resistive random access memory (Gokmen '243 Figure 8, element 820; col.4 lines 53-56: “ANNs are often embodied as so-called “neuromorphic” systems of interconnected processor elements that act as simulated “neurons” and exchange “messages” between each other in the form of electronic signals.”; col.12 lines 45-48: “… the described RPU device can be implemented with resistive random access memory (RRAM), phase change memory (PCM), programmable metallization cell (PMC) memory …”; and col.5 lines 6-13: “Crossbar arrays, also known as crosspoint arrays, cross wire arrays, or RPU arrays, are high density, low cost circuit architectures used to form a variety of electronic circuits and devices, including ANN architectures, neuromorphic microchips and ultra-high density nonvolatile memory. A basic crossbar array configuration includes a set of conductive row wires and a set of conductive column wires formed to intersect the set of conductive row wires.”).); 
neural connections between the artificial neurons comprising interconnect circuitry coupled to control lines of the memory array to subdivide the memory array into a plurality of layers of an artificial neural network (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches resistive processing units Gokmen '243 Figure 8, element 820; col.5 lines 6-13). Gokmen ‘243 further teaches control lines originating from a neuron interface component connecting to the crossbar array, where the crossbar array contains input neurons and output neurons for an artificial neural network (i.e., a CNN performing inference operations) such that the control lines and crossbar arrays connect RPUs representing the neurons in multiple layers of a CNN (Gokmen '243 Figure 16, elements                         
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                            
                        
                    ; col.18 lines 26-48: “FIGS. 16-18 depict aspects of developing, training and using an ANN architecture that includes crossbar arrays of two-terminal, non-liner RPUs according to the present invention. FIG. 16 depicts a starting point for designing a neural network, such as a CNN. In effect, FIG. 16 is an alternative representation of the neural network diagram shown in FIG.3, or in FIG. 5. As shown in FIG.16, the input neurons, which are                         
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                        
                     and                         
                            
                                
                                    x
                                
                                
                                    3
                                
                            
                        
                     are connected to hidden neurons, which are shown by sigma (σ). Weights, which represent a strength of connection, are applied at the connections between the input neurons/nodes and the hidden neurons/nodes, as well as between the hidden neurons/nodes and the output neurons/nodes. … As data moves forward through the network, vector matrix multiplications are performed, wherein the hidden neurons/nodes take the inputs, perform a non-linear transformation, and then send the results to the next weight matrix. This process continues until the data reaches the output neurons/nodes. …”; Figure 19 and col.19 line 60-col.20 line 7; and col.20 lines 61-63: “ … the neuron control system 1900 uses a single RPU array for training multiple layers of the CNN.”).); and 
control circuitry coupled to the interconnect circuitry (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches control lines originating from a neuron interface component connecting to the crossbar array, where the crossbar array contains input neurons and output neurons for an artificial neural network (i.e., a CNN performing inference operations) such that the control lines and crossbar arrays connect RPUs representing the neurons in multiple layers of a CNN (Gokmen '243 Figure 16, elements                         
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                            
                        
                    ; and Figure 19 and col.19 line 60-col.20 line 7: “ … The neuron control system 1900 includes a hardware processor 1902 and memory 1904. … A neuron interface 1908 controls neurons on the CNN, determining whether the neurons are in feed forward mode, back propagation mode, or weight update mode. The neuron interface 1908 furthermore provides inputs to input neurons and receives the output from output neurons. …”).) and configured to: 
transmit a plurality of iterations of an input value on input control lines of a first layer of the artificial neural network for inference operations by at least one or more additional layers (Examiner’s note: Gokmen ‘243 teaches performing training of a CNN (implemented on a neuron control system with a neuron interface component) by transmitting training epochs consisting of batches of input data through multiple convolutional kernel layers in a forward pass by presenting the input values on input control lines (Gokmen '243 col.8 lines 57-60: “Computing the convolutional layers of the CNN, typically, encompasses more than 90% of computation time in neural network training and inference.”; col.20 lines 61-63; col.19 line 61-col.20 line 7; and col.10 line 56-col.11 line 24: “In one or more embodiments, the CNN training is performed using batches. Accordingly, a batch of the input data to be used for training is selected, as shown at block 608. Using the input maps 410 and the convolutional kernels 420, the output maps 430 are generated as described herein, as shown at block 610. Generating the output maps 330 is commonly referred to as a "forward pass." … The processor subsequently modifies the matrices, including the convolutional kernels and the biases, according to the gradient function, as shown at block 625. The processor ensures that all batches of the data are used for the training, as shown at block 628. … The modified convolutional kernels 420 after being adjusted can be used for further training of the CNN, unless the training is deemed completed, as shown at block 630. For example, the training can be deemed completed if the CNN identifies the inputs according to the expected outputs with a predetermined error threshold. If the training is not yet completed, another iteration, or training epoch is performed using the modified convolutional kernels from the most recent iteration.”).); and 
apply an averaging function across output values successively presented on output control lines of a last layer of the artificial neural network from each iteration of the input value (Examiner’s note: Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7: “… An error calculation module 1910 compares the outputs from the neurons to training data 1906 to determine an error signal. Neuron interface 1908 applies the error signal to the output neurons during a back propagation mode and subsequently triggers a weight update mode to train the weights of the CNN accordingly.”; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24; col.11 lines 25-35: “… according to the CNN training above, the CNN learns to model a dependency between the inputs and the expected outputs in the training data. Mathematically, for a vector of input maps S and a vector of outputs X, the CNN learns a model to reduce an error E between S and X. One such error function is the mean square error between S and X, for example:                         
                            E
                            =
                            
                                
                                    ∑
                                    
                                        t
                                    
                                
                                
                                    
                                        
                                            ∥
                                            f
                                            
                                                
                                                    S
                                                    
                                                        
                                                            t
                                                        
                                                    
                                                
                                            
                                            -
                                            X
                                            (
                                            t
                                            )
                                            ∥
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    . Other error functions can include, for example, cross-entropy or logistic loss. …”; and col.11 lines 36-56: “… The CNN, for each layer, adapts a matrix of weights A and a vector of biases a to optimize E. To this end, in the forward pass, a value for each value of a next layer (B. b) is calculated using values of the current layer (A, a). For example, the computations in the forward pass for a layer can be represented as X=f(S)-Bϕ(AS+a)+b, where, A is the matrix of weights of a current layer, a is a bias vector of the current layer, and B and b are weight matrix and bias of the next layer of the CNN. … In the forward pass, the predicted outputs corresponding to the inputs are evaluated according to the above equation. In the backward pass, partial derivatives of the cost function (E) with respect to the different parameters are propagated back through the CNN. The network weights are then be updated using a gradient-based optimization algorithm, such as the gradient descent. The whole process is iterated until the weights have converged.”).).  
Regarding amended Claim 2, Gokmen '243 teaches
(Currently Amended) The circuit of claim 1, the control circuitry further configured to: 
propagate vectors of analog voltages to the input control lines of the plurality of layers for computation by corresponding artificial neurons of the layers (Examiner’s note: Gokmen ‘243 teaches control lines from the neuron control system connected to crossbar arrays of RPUs forming an ANN (i.e., a CNN performing inference operations), where the CNN performs vector matrix multiplications through the CNN, and where the inputs are presented as voltages to the crossbar array (Gokmen '243 Figure 16, elements                         
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                            
                        
                    ; Figure 8, elements                         
                            
                                
                                    V
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    V
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    V
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    4
                                
                            
                        
                    ; col.18 lines 26-44; and col.16 lines 20-28: “Input voltages                        
                             
                            
                                
                                    V
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    V
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    V
                                
                                
                                    3
                                
                            
                        
                     are applied to row wires 802, 804, 806, respectively. Each column wire 808, 810, 812, 814 sums the currents                         
                            
                                
                                    I
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    I
                                
                                
                                    4
                                
                            
                        
                     generated by each RPU along the particular column wire. For example, as shown in FIG.8, the current                         
                            
                                
                                    I
                                
                                
                                    4
                                
                            
                        
                     generated by column wire 814 is according to the equation                         
                            
                                
                                    I
                                
                                
                                    4
                                
                            
                        
                    =                         
                            
                                
                                    V
                                
                                
                                    1
                                
                            
                            
                                
                                    σ
                                
                                
                                    41
                                
                            
                            +
                            
                                
                                    V
                                
                                
                                    2
                                
                            
                            
                                
                                    σ
                                
                                
                                    42
                                
                            
                            +
                            
                                
                                    V
                                
                                
                                    3
                                
                            
                            
                                
                                    σ
                                
                                
                                    43
                                
                            
                        
                    . Thus, array 800 computes the forward matrix multiplication by multiplying the values stored in the RPUs by the row wire inputs, which are defined by voltages                         
                            
                                
                                    V
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    V
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    V
                                
                                
                                    3
                                
                            
                        
                    .”).); and 
detect electrical currents from corresponding output control lines of the plurality of layers to produce the vectors of analog voltages for introduction to successive layers (Examiner’s note: Gokmen ‘243 teaches a crossbar array comprising of conductive wires carrying electrical current in order for each RPU to measure the current during forward matrix multiplication, where the output control lines of the hidden/intermediate layers of the neural network are represented by the crossbar array between RPUs, and the feed-forward operation of the neural network is performed through detection of currents on the wires and generating voltages (Gokmen '243 col.15 lines 3-28; and col.21 lines 22-64: “The method further includes performing the computations for the forward pass, as shown at block 2430. … During feed-forward operation, the set of input 25 neurons (see FIG. 18) each provide an input voltage in parallel to a respective row of RPU devices, which represent the weights of the convolution kernels. The input voltage correspond to the values in the input data 510, which are converted into column vectors 2310. The RPU devices each have a settable resistance value, such that a current output flows from the RPU device 820 to a respective hidden neuron to represent the weighted input. … The current from each RPU device adds column-wise and flows to a hidden neuron. A set of reference weights have a fixed resistance and combine their outputs into a reference current that is provided to each of the hidden neurons. … The hidden neurons use the currents from the array of RPU devices and the reference weights to read the result of the vector matrix multiplication operation. The hidden neurons then output a voltage of their own to another array of RPU devices. This array performs in the same way, with a column of RPU devices receiving a voltage from their respective hidden neuron to produce a weighted current output that adds row-wise and is provided to the output neuron. It should be understood that any number of these stages can be implemented, by interposing additional layers of arrays and hidden neurons.”).).  
Regarding amended Claim 3, Gokmen '243 teaches
(Currently Amended) The circuit of claim 2, 
wherein synaptic weights for the artificial neurons are established as conductance states of the NVM elements (Examiner’s note: Gokmen ‘243 teaches conduction states being generated in forward matrix multiplication, where the conduction state represent stored weights (Gokmen '243 col.15 lines 16-19: “In forward matrix multiplication, the conduction state (i.e., the stored weights) of the RPU can be read by applying a voltage across the RPU and measuring the current that passes through the RPU.”).).  
Regarding original Claim 5, Gokmen '243 teaches
(Original) The circuit of claim 1, the control circuitry further configured to transmit the input value to achieve a target quantity of propagations through the artificial neural network (Examiner’s note: Gokmen ‘243 teaches control circuitry providing inputs to the interconnect circuitry representing the transmission of input data through an input layer, hidden layers, and output layer of an artificial neural network (Gokmen '243 col.7 lines 57-65), until the data reaches the output layer, such that this feed-Gokmen '243 col.18 lines 26-48: “… As shown in FIG. 16, the input neurons, which are x1 , x2 and x3 are connected to hidden neurons, which are shown by sigma (𝜎). Weights, which represent a strength of connection, are applied at the connections between the input neurons/nodes and the hidden neurons/nodes, as well as between the hidden neurons/nodes and the output neurons/nodes. The weights are in the form of a matrix. As data moves forward through the network, vector matrix multiplications are performed, wherein the hidden neurons/nodes take the inputs, perform a non-linear transformation, and then send the results to the next weight matrix. This process continues until the data reaches the output neurons/nodes. …”;  Figure 19 and col.19 line 60-col.20 line 7; col.21 lines 22-64; col.10 line 56-col.11 line 24; and col.18 lines 32-44:).), 
wherein each iteration of the target quantity is initiated after a previous introduction of the input value propagates through at least a first layer of the artificial neural network (Examiner’s note: Gokmen ‘243 teaches each data set within the mini-batches of training data completes a forward and backward pass before processing the next data set, where the control circuitry within the neuron contains a control mechanism to ensure that there is no overlapping of forward pass, backward pass, and weight updates for each propagation at each layer (Gokmen '243 col.7 lines 35: “ANN model 300 process data records one at a time, …”; col.18 lines 48-50: “For each data set, when the forward pass and backward pass are completed, a weight update is performed.”; and col.18 line 64-col.19 line 25: “FIG. 18B illustrates a block diagram of a neuron, which is used as a neuron 1800 of a neural network, such as a CNN. The neuron can represent any of the input neurons, the hidden neurons, or the output neurons (see FIG. 16). It should be noted that FIG. 18B shows components to address all three phases of operation: feed forward, back propagation, and weight update. However, because the different phases do not overlap, there will necessarily be some form of control mechanism within in the neuron 1800 to control which component are active. … In feed forward mode, … Block 1804 performs a computation based on the input, the output of which is stored in storage 1805. … The value determined by the function block 1804 is converted to a voltage at feed forward generator 1806, which applies the voltage to the next array. The signal propagates this way by passing through multiple layers of arrays and neurons until it reaches the final output layer of neurons.”).).  
Regarding original Claim 8, Gokmen '243 teaches
(Original) The circuit of claim 1, 
wherein the inference operations comprise computation and forward propagation operations (Examiner’s note: Gokmen ‘243 teaches performing training and inference on a CNN implemented on the circuit involves matrix computation and forward pass operations through each layer in the CNN (Gokmen '243 col.8 lines 57-60: “Computing the convolutional layers of the CNN, typically, encompasses more than 90% of computation time in neural network training and inference.”; col.9 lines 34-37: “The data values for each layer in the CNN is typically represented using matrices (or tensors in some examples) and computations are performed as matrix computations.”; and col.10 lines 56-67: “In one or more embodiments, the CNN training is performed using batches. Accordingly, a batch of the input data to be used for training is selected, as shown at block 608. Using the input maps 410 and the convolutional kernels 420, the output maps 430 are generated as described herein, as shown at block 610. Generating the output maps 330 is commonly referred to as a “forward pass.”).).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243] in view of Chi et al., PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-based Main Memory, 2016 ACM/IEEE 43rd International Symposium on Computer Architecture, pp.27-39 [hereafter referred as Chi].
Regarding original Claim 4, Gokmen '243 teaches
(Original) The circuit of claim 2.
However, Gokmen '243 does not teach
sense amplifiers coupled to the output control lines and configured to convert the electrical currents into digital representations for introduction to activation functions that determine the vectors for the successive layers.  
Chi teaches
sense amplifiers coupled to the output control lines and configured to convert the electrical currents into digital representations for introduction to activation functions that determine the vectors for the successive layers (Examiner’s note: Chi teaches a full function subarray hardware design within a ReRAM cell with a sense amplifier component, with connections to the column multiplexers (connected to the crossbar array as well as the address and data lines for each ReRAM cell), and where the sense amplifier component is adapted to serve analog-to-digital conversions (Chi p.30 Figure 3; p.31 Figure 4, element C; p.31 col.2 Benefits of Our Design 1st paragraph – p.32 col.1 1st paragraph: “Benefits of Our Design are two-fold. First, our design efficiently utilizes the peripheral circuits by sharing them between memory and computation functions, which significantly reduces the area overhead. For example, in a typical ReRAM-based neuromorphic computing system [10], DACs and ADCs are used for input and output signal conversions; in a ReRAM-based memory system, SAs and write drivers are required for read and write operations. Yet, SAs and ADCs serve similar functions, while write drivers and DACs do similar functions. In PRIME, instead of using both, we reuse SAs and write drivers to serve ADC and DAC functions by slightly modifying the circuit design.”). Chi further teaches the sense amplifier component within the full function subarray includes a ReLU activation unit after the counters and output register to support CNN convolutions (Chi p.31 Figure 4.C; p.31 col.1 Sense Amplifier, 1st paragraph – p.31 col.2 2nd paragraph: “Sense Amplifier. Figure 4 C shows the SA design with the following modifications as marked in light blue in the figure. … Second, we allow SA’s precision to be configured as any value between 1-bit and Po-bit, controlled by the counter as shown in Figure 4 C. The result is stored in the output registers. … Fourth, we add a hardware unit to support ReLU function, a function in the convolution layer of CNN. The circuit checks the sign bit of the result. It outputs zero when the sign bit is negative and the result itself otherwise.”).).  
Both Gokmen '243 and Chi are analogous art since they both teach neural network circuit architectures with crossbar arrays and resistive-based memory.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the resistive processor unit (RPU) as taught in Gokmen '243 and enhance it to include integrated sense amplifiers containing analog-to-digital converters and activation unit as taught in Chi for implementing the artificial neural network using resistive-based memory. The motivation to combine is taught in Chi, as a way to introduce the required functionality in a limited amount of circuit area, thus allowing for a more space-efficient design for the neural network (Chi p.31 col.2 Benefits of Our Design 1st paragraph – p.32 col.1 1st paragraph: “Benefits of Our Design are two-fold. First, our design efficiently utilizes the peripheral circuits by sharing them between memory and computation functions, which significantly reduces the area overhead. For example, in a typical ReRAM-based neuromorphic computing system [10], DACs and ADCs are used for input and output signal conversions; in a ReRAM-based memory system, SAs and write drivers are required for read and write operations. Yet, SAs and ADCs serve similar functions, while write drivers and DACs do similar functions. In PRIME, instead of using both, we reuse SAs and write drivers to serve ADC and DAC functions by slightly modifying the circuit design.”).
Claims 7, 16-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243] in view of Mern et al., Layer-wise synapse optimization for implementing neural networks on general neuromorphic architectures, arXiv:1802.06920v1, February 20 2018, 8 pages [hereafter referred as Mern].
Regarding original Claim 7, Gokmen '243 teaches
(Original) The circuit of claim 1, further comprising: 
a buffer coupled to the control circuitry and configured to store … the plurality of output values for input to the averaging function (Examiner’s note: Gokmen ‘243 teaches a neuron control system  containing memory and a neuron interface component connecting to the input and output neurons, where the neuron control system provides a plurality of input training data to the input neurons and receives a plurality of output values from the output neurons (via forward propagation of a plurality of input data through the network), such that the output is stored in the memory in order for the error calculation module to perform the mean squared error calculation taught in Gokmen '243 col.11 lines 25-35. A person having ordinary skill in the art would understand that a mean squared error calculation is a statistical calculation that measures the average of the squares of the errors (where the error is calculated between the input layer and the output layer, including the intermediate layers) (Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7: “ … The neuron control system 1900 includes a hardware processor 1902 and memory 1904. Training data 1906 for a CNN is stored in the memory 1906 and is used to train weights of the CNN. A neuron interface 1908 controls neurons on the CNN, determining whether the neurons are in feed forward mode, back propagation mode, or weight update mode. The neuron interface 1908 furthermore provides inputs to input neurons and receives the output from output neurons. An error calculation module 1910 compares the outputs from the neurons to training data 1906 to determine an error signal. Neuron interface 1908 applies the error signal to the output neurons during a back propagation mode and subsequently triggers a weight update mode to train the weights of the CNN accordingly.”; col.10 line 56-col.11 line 24; col.11 lines 25-35; and col.11 lines 36-56).).  
However, Gokmen '243 does not teach
… logit vector representations …
Mern teaches
… logit vector representations (Examiner’s note: In light of Applicant’s specification paragraph [0030], the term “logit” refers to non-normalized outputs produced by an artificial neural network: “For inference operations, a final logit vector comprising non-normalized predictions is obtained in a last layer before being fed into a softmax function to generate normalized probabilities for classification”. Mern teaches performing a layer-wise synapse optimization translation at each layer of an ANN (implemented on a neuromorphic chip) to produce a spiking neural network (SNN), where the translated ANN produces logit outputs resulting from input data provided at each layer, where the logit outputs represent unnormalized vectors at the output layer before applying it to a final softmax function. Mern further teaches ten episodes of 200 steps each were used as sample inputs, for a total of 200 samples, with each sample provided as successive inputs into the ANN, thus producing corresponding successive outputs (Mern p.1 col.2 2nd paragraph – p.2 col.1 1st paragraph; Mern p.4 col.2 2nd paragraph: “The translated ANN had three layers (two hidden layers), with 64 rectified linear unit (ReLU) neurons at each hidden layer. The ANN outputs unnormalized logits, which were passed to a softmax function and then used as probability masses for each discrete action. … Trajectories from ten episodes of 200 steps each were used as the sample inputs, for a total of 2,000 samples …”).) …
Both Gokmen '243 and Mern are analogous art since they both teach implementing artificial neural networks on neuromorphic circuits.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the outputs of each layer resulting from a forward pass taught in Gokmen '243 and perform the layer-wise synapse optimization translation taught in Mern as a way to produce the logit vectors at the output layer. The motivation to combine is taught in Mern, as providing unnormalized vector representations to a softmax layer to normalize the outputs of an artificial neural network (ANN) (Mern p.1 Abstract: “Deep artificial neural networks (ANNs) can represent a wide range of complex functions. Implementing ANNs in Von Neumann computing systems, though, incurs a high energy cost due to the bottleneck created between CPU and memory. Implementation on neuromorphic systems may help to reduce energy demand. Conventional ANNs must be converted into equivalent Spiking Neural Networks (SNNs) in order to be deployed on neuromorphic chips. This paper presents a way to perform this translation. We map the ANN weights to SNN synapses layer-by-layer by forming a least-square-error approximation problem at each layer. An optimal set of synapse weights may then be found for a given choice of ANN activation function and SNN neuron. Using an appropriate constrained solver, we can generate SNNs compatible with digital, analog, or hybrid chip architectures. We present an optimal node pruning method to allow SNN layer sizes to be set by the designer. To illustrate this process, we convert three ANNs, including one convolutional network, to SNNs. In all three cases, a simple linear program solver was used. The experiments show that the resulting networks maintain agreement with the original ANN and excellent performance on the evaluation tasks. The networks were also reduced in size with little loss in task performance.”).
Regarding original Claim 16, Gokmen '243 teaches
(Original) A method comprising: 
introducing an input value to an input layer of an artificial neural network over a target quantity of iterations for propagation through at least one hidden layer of the artificial neural network (Examiner’s note: Gokmen ‘243 teaches performing training of a CNN by transmitting training epochs consisting of batches of input data through multiple convolutional kernel layers in a forward pass by presenting the input values on input control lines, where the convolutional kernel layers correspond to hidden layers of an artificial neural network, and where the batches of input data represent providing inputs over a target quantity of iterations (Gokmen '243 col.18 lines 26-48; col.7 lines 5-17; col.8 lines 57-60: “Computing the convolutional layers of the CNN, typically, encompasses more than 90% of computation time in neural network training and inference.”; lines 61-63; col.19 line 61-col.20 line 7; and col.10 line 56-col.11 line 24: “In one or more embodiments, the CNN training is performed using batches. Accordingly, a batch of the input data to be used for training is selected, as shown at block 608. Using the input maps 410 and the convolutional kernels 420, the output maps 430 are generated as described herein, as shown at block 610. Generating the output maps 330 is commonly referred to as a "forward pass." … The processor subsequently modifies the matrices, including the convolutional kernels and the biases, according to the gradient function, as shown at block 625. The processor ensures that all batches of the data are used for the training, as shown at block 628. … The modified convolutional kernels 420 after being adjusted can be used for further training of the CNN, unless the training is deemed completed, as shown at block 630. For example, the training can be deemed completed if the CNN identifies the inputs according to the expected outputs with a predetermined error threshold. If the training is not yet completed, another iteration, or training epoch is performed using the modified convolutional kernels from the most recent iteration.”).); and 
determining a result by applying a noise reduction function among … vectors presented by an output layer of the artificial neural network after the target quantity of iterations of the input value have completed propagation through the at least one hidden layer (Examiner’s note: As presented earlier in response to Applicant’s arguments, in light of Applicant’s specification paragraph [0030], Applicant indicates that errors are a form of noise, which is also consistent with the term “noise” as defined in the Oxford Dictionary of Computing (6th edition 2008, as being “any signal that occurs in an electronic or communication system and is considered extraneous to the desired signal being propagated. Noise can be introduced, for example, by external disturbances and may be deleterious in a given system since it can produce spurious signals, i.e., errors.”). As indicated earlier, Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error calculation between the inputs and output values, where the outputs are produced via forward propagation of data present in the training epochs through one or more additional layers. A person having ordinary skill in the art would understand that a mean squared error calculation is Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24; col.11 lines 25-35: “… according to the CNN training above, the CNN learns to model a dependency between the inputs and the expected outputs in the training data. Mathematically, for a vector of input maps S and a vector of outputs X, the CNN learns a model to reduce an error E between S and X. One such error function is the mean square error between S and X, for example:                         
                            E
                            =
                            
                                
                                    ∑
                                    
                                        t
                                    
                                
                                
                                    
                                        
                                            ∥
                                            f
                                            
                                                
                                                    S
                                                    
                                                        
                                                            t
                                                        
                                                    
                                                
                                            
                                            -
                                            X
                                            (
                                            t
                                            )
                                            ∥
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    . Other error functions can include, for example, cross-entropy or logistic loss. …”; and col.11 lines 36-56).).  
However, Gokmen '243 does not teach
… logit vectors presented by an output layer of the artificial neural network …
Mern teaches
… logit vectors presented by an output layer of the artificial neural network (Examiner’s note: As indicated earlier, Mern teaches a method to perform a layer-wise synapse optimization translation at each layer of an ANN (implemented on a neuromorphic chip) to produce a spiking neural network (SNN), where the translated ANN produces logit outputs resulting from input data provided at each layer, and where the logit outputs represent unnormalized vectors at the output layer before applying it to a final softmax function. Mern further teaches ten episodes of 200 steps each were used as sample inputs, for a total of 200 samples, with each sample provided as successive inputs into the ANN, thus producing corresponding successive outputs (Mern p.1 col.2 2nd paragraph – p.2 col.1 1st paragraph; p.4 col.2 2nd paragraph).) …
Both Gokmen '243 and Mern are analogous art since they both teach implementing artificial neural networks on neuromorphic circuits.

Regarding original Claim 17, Gokmen '243 in view of Mern teaches
(Original) The method of claim 16, comprising: 
determining the result by applying the noise reduction function based at least on averaging the logit vectors resulting from the target quantity of iterations (Examiner’s note: As indicated earlier, Mern teaches performing a layer-wise synapse optimization translation at each layer of an ANN (implemented on a neuromorphic chip) to produce a spiking neural network (SNN), where the translated ANN produces logit outputs resulting from input data provided at each layer, thereby producing logit vectors at the output layer before applying it to a final softmax function. Mern further teaches ten episodes of 200 steps each were used as sample inputs, for a total of 200 samples, with each sample provided as successive inputs into the ANN, thus producing corresponding successive outputs (Mern p.1 col.2 2nd paragraph – p.2 col.1 1st paragraph; and p.4 col.2 2nd paragraph). As indicated earlier, Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error calculation between the inputs and output values, where the outputs are produced via forward propagation of data present in the training epochs through one or more additional layers. A person having ordinary skill in the art would understand that a mean squared error calculation is a statistical calculation that measures the average of the squares of the errors (where the error is calculated between the input layer and the output layer, including the intermediate layers). Gokmen ‘243 further teaches the training of a CNN includes performing forward passes, where the forward pass is done through transmitting training epochs consisting of applying batches of input data on input control lines through multiple convolutional kernel layers (“hidden layers in an artificial neural network”) (Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24; and col.11 lines 25-35).).
Regarding amended Claim 20, Gokmen '243 in view of Mern teaches
 The method of claim 16, further comprising: 
in a memory element coupled to the output layer, storing the output values from the target quantity of iterations for input to the noise reduction function (Examiner’s note: Gokmen ‘243 teaches a neuron control system containing memory elements and a neuron interface component connecting to the input and output neurons, where the neuron control system provides a plurality of input training data to the input neurons to the neuron interface and receives a plurality of output values from the output neurons, such that the output is stored in the memory elements in order for the error calculation module to perform the mean squared error comparison between the outputs from the neurons to training data to determine the error signal (Gokmen '243 Figure 19 and col.19 line 60-col.20 line 7; and col.11 lines 25-35).).
Claims 6 and 9-13 are rejected under 35 U.S.C. 103 as being unpatentable over Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243] in view of Gokmen et al., U.S. PGPUB 2018/0253642, filed 3/1/2017 [hereafter referred as Gokmen '642].
Regarding original Claim 6, Gokmen '243 teaches
(Original) The circuit of claim 5, comprising: 
the control circuitry configured to select the target quantity for the averaging function to bring a … noise of the artificial neural network to below a threshold level (Examiner’s note: As presented earlier in response to Applicant’s arguments, Applicant’s specification paragraph [0030] indicates that errors are a form of noise, which is also consistent with the definition of “noise” provided in the Oxford Dictionary of Computing (6th edition, 2008: “any signal that occurs in an electronic or communication system and is considered extraneous to the desired signal being propagated. Noise can be introduced, for example, by external disturbances and may be deleterious in a given system since it can produce spurious signals, i.e., errors.”). Furthermore, in light of applicant’s specification paragraph [0043]: “a noise reduction function comprises an averaging function applied across more than one output value or more than one set of output values”. Gokmen ‘243 teaches performing a selection of training epochs containing batches of input data to be used as inputs into the CNN, such that the input data is propagated through the network using a forward pass, and that training is completed when the CNN identifies the inputs according to expected outputs within a predetermined error threshold. As indicated earlier, the error for Gokmen '243 Figure 19 and col.19 line 61-col.20 line 7; col.11 lines 25-35; and col.10 line 56-col.11 line 24: “In one or more embodiments, the CNN training is performed using batches. Accordingly, a batch of the input data to be used for training is selected, as shown at block 608. Using the input maps 410 and the convolutional kernels 420, the output maps 430 are generated as described herein, as shown at block 610. Generating the output maps 330 is commonly referred to as a "forward pass." … The processor subsequently modifies the matrices, including the convolutional kernels and the biases, according to the gradient function, as shown at block 625. The processor ensures that all batches of the data are used for the training, as shown at block 628. … The modified convolutional kernels 420 after being adjusted can be used for further training of the CNN, unless the training is deemed completed, as shown at block 630. For example, the training can be deemed completed if the CNN identifies the inputs according to the expected outputs with a predetermined error threshold. If the training is not yet completed, another iteration, or training epoch is performed using the modified convolutional kernels from the most recent iteration.”).).  
However, Gokmen '243 does not explicitly teach
… a forward propagation noise of the artificial neural network …
Gokmen '642 teaches
… a forward propagation noise of the artificial neural network (Examiner’s note: Gokmen ‘642 teaches peripheral devices such as op-amps and the RPU crossbar array itself introduces noise, where the noise from a RPU crossbar array that is used for forward propagation of input values corresponds to forward propagation noise (Gokmen '642 [0085]: “Analog computation is sensitive to various noise sources such as thermal noise, shot noise, etc., that are all additive and can be modeled as a single unbiased Gaussian noise. …”; and [0102]: “Various noise sources can contribute to total acceptable input referred noise level of an op-amp including thermal noise, shot noise, and supply voltage noise, etc. Thermal noise due to a pair of arrays with 4096x4096 RPU devices can be estimated as 7 .0 (nV/√Hz). Depending on the exact physical implementation of an RPU device and type of non-linear I-V response, shot noise levels produced by the RPU array can vary. Assuming a diode-like model, total shot noise from a whole array scales as a square root of a number of active RPU devices in a column (or a row), and hence depends on an overall instantaneous activity of the array.”).) …
Both Gokmen '243 and Gokmen '642 are analogous art since they both teach artificial neural network circuitry with crossbar arrays and resistive processing units.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take a quantity of training epochs containing batches of input data taught in Gokmen '243 and select and perform a fixed number of those training epochs and batches of input data as taught in Gokmen '642 to define a stopping condition to reduce a classification error/noise (and hence reach a target inference accuracy). The motivation to combine is taught in Gokmen '642, since defining a target quantity of training epochs utilizes the locality and parallelism of a feed-forward algorithm such that it averages out the classification errors/noise that may be introduced during forward propagation and the weight updates, thereby minimizing and smoothing out the overall performance impact of those errors/noise on a neural network (Gokmen '642 [0073]: “These results validate that although the updates in the stochastic model are probabilistic, classification errors can become indistinguishable from those achieved with the baseline model. The implementation of the stochastic update rule on an array of analog RPU devices with non-linear switching characteristics effectively utilizes the locality and the parallelism of the algorithm. As a result, the update time is becoming independent of the array size and is a constant value proportional to BL, thus achieving the required O(1) time complexity.”).  
Regarding original Claim 9, Gokmen '243 teaches
(Original) An artificial neural network, comprising: 
an input layer (Examiner’s note: Gokmen ‘243 teaches an ANN model containing a plurality of nodes arranged in layers, one of which is an input layer, with associated weight connections as the directed edges between the input layer and the hidden layer (Gokmen '243 Figure 3; col.7 lines 5-17: “FIG. 3 depicts a simplified ANN model 300 organized as a weighted directional graph, wherein the artificial neurons are nodes (e.g., 302, 308, 316), and wherein weighted directed edges (e.g., m1 to m20) connect the nodes. ANN model 300 is organized such that nodes 302, 304, 306 are input layer nodes, …”).); 
an output layer (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches an ANN model containing a plurality of nodes arranged in layers, one of which is an output layer, with associated weight connections as the directed edges between the output layer and the hidden layer (Gokmen '243 Figure 3; col.7 lines 5-17).); 
one or more intermediate layers between the input layer and the output layer, each comprising one or more nodes having accompanying node connections and synaptic weights (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches an ANN model containing a plurality of nodes arranged in layers, with a hidden layer in between the input and output layers, with associated weight connections as the directed edges between the input layer and the hidden layer, and the output layer and the hidden layer (Gokmen '243 Figure 3; col.7 lines 5-17: “FIG. 3 depicts a simplified ANN model 300 organized as a weighted directional graph, wherein the artificial neurons are nodes (e.g., 302, 308, 316), and wherein weighted directed edges (e.g., m1 to m20) connect the nodes. ANN model 300 is organized such that nodes 302, 304, 306 are input layer nodes, nodes 308,310, 312,314 are hidden layer nodes and nodes 316, 318 are output layer nodes. … Although only one input layer, one hidden layer and one output layer are shown, in practice, multiple input layers, hidden layers and output layers can be provided.”).); 
a control circuit coupled to the input layer and configured to introduce a plurality of successive instances of input data to the input layer for propagation through at least the one or more intermediate layers (Examiner’s note: Under its broadest reasonable interpretation, the term “successive instances of the input data” is interpreted as iterations of input values. As indicated earlier, Gokmen ‘243 teaches control lines originating from a neuron interface component connecting to the crossbar array, where the crossbar array contains input neurons and output neurons for an artificial neural network (i.e., a CNN performing inference operations) such that the control lines and crossbar arrays connect RPUs representing the neurons in multiple layers of a CNN, with the neuron interface providing inputs to the input neurons (Gokmen '243 Figure 16, elements                         
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                            
                        
                    ; Figure 19 and col.19 line 60-col.20 line 7; and col.20 lines 61-63). As indicated earlier, Gokmen ‘243 teaches performing training of Gokmen '243 col.18 lines 26-48; col.7 lines 5-17; col.8 lines 57-60; and col.10 line 56-col.11 line 24).); and 
the control circuit coupled to the output layer and configured to reduce a … noise in a result based at least on applying a noise reduction function to successive output values presented at the output layer resultant from the plurality of successive instances of the input data (Examiner’s note: Under its broadest reasonable interpretation, the term “successive instances of the input data” is interpreted as iterations of input values. As indicated earlier, Gokmen ‘243 teaches control lines originating from a neuron interface component connecting to the crossbar array, where the crossbar array contains input neurons and output neurons for an artificial neural network (i.e., a CNN performing inference operations) such that the control lines and crossbar arrays connect RPUs representing the neurons in multiple layers of a CNN, with the neuron interface providing inputs to the input neurons (Gokmen '243 Figure 16, elements                         
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    x
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                            
                        
                    ; Figure 19 and col.19 line 60-col.20 line 7). Gokmen ‘243 further teaches the training of a CNN includes performing forward and backward passes, where the forward pass is done through transmitting training epochs consisting of applying batches of input data on input control lines through multiple convolutional kernel layers to generate predicted outputs corresponding to the inputs (Gokmen '243 col.7 lines 5-17; col.8 lines 57-60). Furthermore, as presented earlier in response to Applicant’s arguments, in light of Applicant’s specification paragraph [0030], Applicant indicates that errors are a form of noise, which is also consistent with the term “noise” as defined in the Oxford Dictionary of Computing (6th edition 2008, as being “any signal that occurs in an electronic or communication system and is considered extraneous to the desired signal being propagated. Noise can be introduced, for example, by external disturbances and may be deleterious in a given system since it can produce spurious signals, i.e., errors.”). As indicated earlier, Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error calculation between the inputs and output values, where the outputs are Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7; col.10 line 56-col.11 line 24; col.11 lines 25-35; and col.11 lines 36-56).).  
However, Gokmen '243 does not explicitly teach
… a forward propagation noise …
Gokmen '642 teaches
… a forward propagation noise (Examiner’s note: As indicated earlier, Gokmen ‘642 teaches peripheral devices such as op-amps and the RPU crossbar array itself introduces noise, where the noise from a RPU crossbar array that is used for forward propagation of input values corresponds to forward propagation noise (Gokmen '642 [0085]; and [0102]). Gokmen ‘642 further teaches performing a selection of mini-batch and epoch training sizes to reduce classification error (interpreted as forward propagation noise, since noise introduced during forward propagation affects both forward propagation and the weight update cycle), where the classification error after a certain number of epochs of input data represents approaching a target classification (inference) accuracy (Gokmen '642 [0068]).) …
Both Gokmen '243 and Gokmen '642 are analogous art since they both teach artificial neural network circuitry with crossbar arrays and resistive processing units.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take a quantity of training epochs containing batches of input data taught in Gokmen '243 and select and perform a fixed number of those training epochs and batches of input data as taught in Gokmen '642 to define a stopping condition to reduce a classification error/noise (and hence reach a target inference accuracy). The motivation to combine is taught in Gokmen '642, as provided in the prior art claim mapping of Claim 6 recited above.  
Regarding original Claim 10, Gokmen '243 in view of Gokmen '642 teaches
The artificial neural network of claim 9, comprising: 
the control circuit configured to introduce the input data to the input layer for a target quantity of iterations of the input data to propagate through the artificial neural network (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches control circuitry providing inputs to the interconnect circuitry representing the transmission of input data through an input layer, hidden layers, and output layer of an artificial neural network (Gokmen '243 col.7 lines 57-65), until the data reaches the output layer, such that this feed-forward process of transmitting currents and voltages through a network represents the passing-through and calculation of data through the artificial neural network for a target quantity of feed-forward propagations (Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 lines 60-col.20 line 7; col.21 lines 22-64; and col.10 line 56-col.11 line 24).), 
wherein each iteration of the target quantity of iterations is initiated after a previous introduction of the input data propagates through at least a first intermediate layer (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches each data set within the mini-batches of training data completes a forward and backward pass before processing the next data set, where the control circuitry within the neuron contains a control mechanism to ensure that there is no overlapping of forward pass, backward pass, and weight updates for each propagation at each layer (Gokmen '243 col.7 lines 35: “ANN model 300 process data records one at a time, …”; col.18 lines 48-50: “For each data set, when the forward pass and backward pass are completed, a weight update is performed.”; and col.18 line 64-col.19 line 25: “FIG. 18B illustrates a block diagram of a neuron, which is used as a neuron 1800 of a neural network, such as a CNN. The neuron can represent any of the input neurons, the hidden neurons, or the output neurons (see FIG. 16). It should be noted that FIG. 18B shows components to address all three phases of operation: feed forward, back propagation, and weight update. However, because the different phases do not overlap, there will necessarily be some form of control mechanism within in the neuron 1800 to control which component are active. … In feed forward mode, … Block 1804 performs a computation based on the input, the output of which is stored in storage 1805. … The value determined by the function block 1804 is converted to a voltage at feed forward generator 1806, which applies the voltage to the next array. The signal propagates this way by passing through multiple layers of arrays and neurons until it reaches the final output layer of neurons.”).).  
Regarding original Claim 11, Gokmen '243 in view of Gokmen '642 teaches
(Original) The artificial neural network of claim 9, wherein the noise reduction function comprises an averaging function applied over the successive output values (Examiner’s note: Under its broadest reasonable interpretation, the term “successive output values” is interpreted as indicating the output values are produced over applying iterations of input values. Furthermore, as presented earlier in response to Applicant’s arguments, in light of Applicant’s specification paragraph [0030], Applicant indicates that errors are a form of noise, which is also consistent with the term “noise” as defined in the Oxford Dictionary of Computing (6th edition 2008, as being “any signal that occurs in an electronic or communication system and is considered extraneous to the desired signal being propagated. Noise can be introduced, for example, by external disturbances and may be deleterious in a given system since it can produce spurious signals, i.e., errors.”). As indicated earlier, Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error calculation between the inputs and output values, where the outputs are produced via forward propagation of data present in the training epochs through one or more additional layers. A person having ordinary skill in the art would understand that a mean squared error calculation is a statistical calculation that measures the average of the squares of the errors (where the error is calculated between the input layer and the output layer, including the intermediate layers). Gokmen ‘243 further teaches the training of a CNN includes performing forward and backward passes, where the forward pass is done through transmitting training epochs consisting of applying batches of input data on input control lines through multiple convolutional kernel layers to generate predicted outputs corresponding to the inputs (Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24; col.11 lines 25-35; and col.11 lines 36-56).).  
Regarding amended Claim 12, Gokmen '243 in view of Gokmen '642 teaches
(Currently Amended) The artificial neural network of claim 9, further comprising: 
an output buffer coupled to the output layer configured to store a portion of the successive output values for input to the noise reduction function (Examiner’s note: Under its broadest reasonable interpretation, the term “successive output values” is interpreted as indicating the output values are  Gokmen ‘243 teaches a neuron control system containing memory elements and a neuron interface component connecting to the input and output neurons, where the neuron control system provides a plurality of input training data to the input neurons to the neuron interface and receives a plurality of output values from the output neurons, such that the output is stored in the memory elements (representing output buffers) in order for the error calculation module to perform the mean squared error comparison between the outputs from the neurons to training data to determine the error signal (Gokmen '243 Figure 19 and col.19 line 60-col.20 line 7; and col.11 lines 25-35).).  
Regarding original Claim 13, Gokmen '243 in view of Gokmen '642 teaches
(Original) The artificial neural network of claim 9, comprising: 
the control circuit configured to select a quantity of the successive instances to reduce the forward propagation noise and reach at least a target inference accuracy in the result (Examiner’s note: Under its broadest reasonable interpretation, the term “a quantity of successive instances” is interpreted as iterations of input values. Gokmen ‘243 teaches performing training of a CNN (implemented on a neuron control system with a neuron interface component) by transmitting training epochs consisting of batches of input data through multiple convolutional kernel layers in a forward pass by presenting the input values on input control lines, where the convolutional kernel layers correspond to hidden layers of an artificial neural network, and where the batches of input data represent providing inputs over a target quantity of iterations (Gokmen '243 col.18 lines 26-48; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24). As indicated earlier, Gokmen ‘642 further teaches peripheral devices such as op-amps and the RPU crossbar array itself introduces noise, where the noise from a RPU crossbar array that is used for forward propagation of input values corresponds to forward propagation noise (Gokmen '642 [0085]; [0102]). Gokmen ‘642 further teaches performing a selection of mini-batch and epoch training sizes to reduce classification error (interpreted as forward propagation noise, since noise introduced during forward propagation affects both forward propagation and the weight update cycle), where the classification error after a certain number of epochs of input data represents approaching a target classification (inference) accuracy (Gokmen '642 [0068]).). 
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243] in view of view of Gokmen et al., U.S. PGPUB 2018/0253642, filed 3/1/2017 [hereafter referred as Gokmen '642] as applied to Claim 9; in further view of Mern et al., Layer-wise synapse optimization for implementing neural networks on general neuromorphic architectures, arXiv:1802.06920v1, February 20 2018, 8 pages [hereafter referred as Mern].
Regarding original Claim 14, Gokmen '243 in view of Gokmen '642 as applied to Claim 9 teaches
(Original) The artificial neural network of claim 9, 
wherein each of the successive output values (Examiner’s note: Under its broadest reasonable interpretation, the term “successive output values” is interpreted as indicating the output values are produced over applying iterations of input values. As indicated earlier, Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error calculation between the inputs and output values, where the outputs are produced via forward propagation of data present in the training epochs through one or more additional layers. A person having ordinary skill in the art would understand that a mean squared error calculation is a statistical calculation that measures the average of the squares of the errors (where the error is calculated between the input layer and the output layer, including the intermediate layers), and hence this calculation represents an averaging function that is being applied on a set of outputs (given a set of inputs). Gokmen ‘243 further teaches the training of a CNN includes performing forward and backward passes, where the forward pass is done through transmitting training epochs consisting of applying batches of input data on input control lines through multiple convolutional kernel layers to generate predicted outputs corresponding to the inputs (Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 61-col.20 line 7; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24; col.11 lines 25-35; and col.11 lines 36-56).) …  
However, Gokmen '243 in view of Gokmen '642 does not teach
… comprise logit vectors prior to introduction to a softmax process.
Mern teaches
comprise logit vectors prior to introduction to a softmax process (Examiner’s note: As indicated earlier, Mern teaches performing a layer-wise synapse optimization translation at each layer of an ANN (implemented on a neuromorphic chip) to produce a spiking neural network (SNN), where the ANN translation produces logit outputs resulting from input data provided at each layer, where the logit outputs represent unnormalized vectors at the output layer before applying it to a final softmax function. Mern further teaches ten episodes of 200 steps each were used as sample inputs, for a total of 200 samples, with each sample provided as successive inputs into the ANN, thus producing corresponding successive outputs (Mern p.1 col.2 2nd paragraph – p.2 col.1 1st paragraph; p.4 col.2 2nd paragraph).).
Both Gokmen '243 in view of Gokmen '642 and Mern are analogous art since they both teach implementing artificial neural networks on neuromorphic circuits.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the outputs of each layer resulting from a forward pass taught in Gokmen '243 in view of Gokmen '642 and perform the layer-wise synapse optimization translation taught in Mern as a way to produce the logit vectors at the output layer. The motivation to combine is taught in Mern, as provided in the prior art claim mapping of Claim 7 recited above.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243] in view of Gokmen et al., U.S. PGPUB 2018/0253642, filed 3/1/2017 [hereafter referred as Gokmen '642] as applied to Claim 9; in further view of Chi et al., PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-based Main Memory, 2016 ACM/IEEE 43rd International Symposium on Computer Architecture, pp.27-39 [hereafter referred as Chi].
Regarding original Claim 15, Gokmen '243 in view of Gokmen '642 as applied to Claim 9 teaches
(Original) The artificial neural network of claim 9, 
wherein the one or more nodes of each of the one or more intermediate layers comprise 
non-volatile memory elements that store the synaptic weights (Examiner’s note: As indicated earlier, Gokmen ‘243 teaches resistive processing units (RPUs) within a crossbar array (with the RPUs Gokmen '243 Figure 8, element 820; col.4 lines 53-56; col.12 lines 45-48; and col.5 lines 6-13; and col.18 lines 26-44). Gokmen ‘243 further teaches conduction states being generated in forward matrix multiplication, where the conduction state represent stored weights (Gokmen '243 col.15 lines 16-19).) and 
yield node outputs based at least in part on conductance values of the non-volatile memory elements (Examiner’s note: Gokmen ‘243 teaches that each hidden neuron use currents from the array of RPU devices to perform vector matrix multiplication with the stored weights in each RPU, and then produce output voltages to another array of RPU devices (Gokmen '243 col.21 lines 54-64: “The hidden neurons use the currents from the array of RPU devices and the reference weights to read the result of the vector matrix multiplication operation. The hidden neurons then output a voltage of their own to another array of RPU devices. This array performs in the same way, with a column of RPU devices receiving a voltage from their respective hidden neuron to produce a weighted current output that adds row-wise and is provided to the output neuron. It should be understood that any number of these stages can be implemented, by interposing additional layers of arrays and hidden neurons.”).), …
wherein at least a portion of the forward propagation noise of the artificial neural network is associated with the analog-to-digital conversion circuitry (Examiner’s note: Gokmen ‘642 teaches measuring the noise level based on identifying                         
                            
                                
                                    t
                                
                                
                                    m
                                    e
                                    a
                                    s
                                
                            
                        
                     for an operational-amplifier with an integrated analog-to-digital converter (ADC) (Gokmen '642 [0095]: “ ... an operational amplifier (op-amp) that integrates the differential current on the capacitor                         
                            
                                
                                    C
                                
                                
                                    i
                                    n
                                    t
                                
                            
                        
                    , and an analog-to-digital converter (ADC).”). Gokmen ‘642 further teaches the noise level source can be due to thermal noise, shot noise, supply voltage noise (Gokmen '642 [0102]: “Various noise sources can contribute to total acceptable input referred noise level of an op-amp including thermal noise, shot noise, and supply voltage noise, etc. … The average activity of the network that is typical for the models of FIGS. 2-4 is less than 1% for the backward cycle, while for the forward cycle it is much higher (approaching 20%). Correspondingly, these activities result in shot noise values of 3.1 nV/√Hz and 13.7 nV/√Hz, for backward and forward cycles respectively.”).).  
Gokmen '243 in view of Gokmen '642 does not explicitly teach
… wherein the node outputs are coupled to analog-to-digital conversion circuitry for introduction to 
Chi teaches
… wherein the node outputs are coupled to analog-to-digital conversion circuitry for introduction to (Examiner’s note: As indicated earlier, Chi teaches a full function subarray hardware design within a ReRAM cell with a sense amplifier component, with connections to the column multiplexers (connected to the crossbar array as well as the address and data lines for each ReRAM cell), and where the sense amplifier component is adapted to serve analog-to-digital conversions (Chi p.30 Figure 3; p.31 Figure 4, element C; p.31 col.2 Benefits of Our Design 1st paragraph – p.32 col.1 1st paragraph). Chi also teaches the sense amplifier component within the full function subarray includes a ReLU activation unit after the counters and output register to support CNN convolutions. Referring to the memory data flow and computation data flow arrows in Chi Figure 4, Chi further teaches the column multiplexers and buffer subarrays forming the data and control connections between ReRAMs, connecting the outputs from the sense amplifier circuitry to other ReRAM units in order to perform CNN computation, where the ReRAM units correspond to “nodes”, and where a plurality of ReRAM units comprise one or more intermediate layers (Chi p.31 Figure 4.C; p.31 col.1 Sense Amplifier, 1st paragraph – p.31 col.2 2nd paragraph; p.29 Section B. Accelerating NNs in Hardware, 1st-3rd paragraphs: “Artificial neural networks (ANNs) are a family of machine learning algorithms inspired by the human brain structure. Generally, they are presented as network of interconnected neurons, containing an input layer, an output layer, and sometimes one or more hidden layers. … ReRAM is becoming a promising candidate to build area-efficient synaptic arrays for NN computation [10]–[13], as it emerges with crossbar architecture. Recently, Presioso et al. fabricated a 12×12 ReRAM crossbar prototype with a fully operational neural network…”; p.31 Figure 4. B and D; p.31 col.1 Column Multiplexer: “In order to support NN computation, we modify the column multiplexers in ReRAM by adding the components marked in light blue in Figure 4B. … After analog processing, the output current is sensed by local SAs.”; and p.31 col.2 Buffer Connection: “Figure 4D shows the communication between the FF subarrays and the Buffer subarray. We enable an FF subarray to access any physical location in a Buffer subarray to accommodate the random memory access pattern in NN computation (e.g., in the connection of two convolutional layers).”).) …
Both Gokmen '243 in view of Gokmen '642 and Chi are analogous art since they both teach neural network circuit architectures with crossbar arrays and resistive-based memory.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the resistive processor unit (RPU) as taught in Gokmen '243 in view of Gokmen '642 and enhance it to include integrated sense amplifiers containing analog-to-digital converters and activation unit as taught in Chi for implementing the artificial neural network using resistive-based memory. The motivation to combine is taught in Chi, as provided in the prior art claim mapping of Clam 4 recited above.
Claims 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gokmen, Tayfun, U.S. Patent 9,646,243, issued 5/9/2017 [hereafter referred as Gokmen '243] in view of Mern et al., Layer-wise synapse optimization for implementing neural networks on general neuromorphic architectures, arXiv:1802.06920v1, February 20 2018, 8 pages [hereafter referred as Mern] as applied to Claim 16; in further view of Gokmen et al., U.S. PGPUB 2018/0253642, filed 3/1/2017 [hereafter referred as Gokmen '642].
Regarding original Claim 18, Gokmen '243 in view of Mern as applied to Claim 16 teaches
(Original) The method of claim 16, 
wherein the result is computed to reduce … noise … through the at least one hidden layer of the artificial neural network (Examiner’s note: As presented earlier in response to Applicant’s arguments, in light of Applicant’s specification paragraph [0030], Applicant indicates that errors are a form of noise, which is also consistent with the term “noise” as defined in the Oxford Dictionary of Computing (6th edition 2008, as being “any signal that occurs in an electronic or communication system and is considered extraneous to the desired signal being propagated. Noise can be introduced, for example, by external disturbances and may be deleterious in a given system since it can produce spurious signals, i.e., errors.”). As indicated earlier, Gokmen ‘243 teaches training a CNN, where the CNN is implemented by a neuron control system containing an error calculation module performing a mean square error calculation between the inputs and output values, where the outputs are produced via forward propagation of data present in the training epochs through one or more additional layers. A person having ordinary skill in the art would understand that a mean squared error calculation is a statistical calculation that measures the average of the squares of the errors (where the error is calculated between the input layer and the output layer, including the intermediate layers). Gokmen ‘243 further teaches that the calculated error is used in the backward pass to update the weights in the CNN to adjust for the error, and hence reducing noise (Gokmen '243 col.18 lines 26-48; Figure 19 and col.19 line 60-col.20 line 7; col.10 line 56-col.11 line 24; col.11 lines 25-35; and col.11 lines 36-56).).  
However, Gokmen '243 in view of Mern does not teach
… forward propagation noise associated with processing of the input value ….
Gokmen '642 teaches
… forward propagation noise associated with processing of the input value (Examiner’s note: As indicated earlier, Gokmen ‘642 teaches peripheral devices such as op-amps and the RPU crossbar array itself introduces noise, where the noise from a RPU crossbar array that is used for forward propagation of input values corresponds to forward propagation noise (Gokmen '642 [0085]; and [0102]). Gokmen ‘642 further teaches performing a selection of mini-batch and epoch training sizes to reduce classification error (interpreted as forward propagation noise, since noise introduced during forward propagation affects both forward propagation and the weight update cycle), where the classification error after a certain number of epochs of input data represents approaching a target classification (inference) accuracy (Gokmen '642 [0068]).) …
Both Gokmen ‘243 in view of Mern and Gokmen '642 are analogous art since they both teach artificial neural network circuitry with crossbar arrays and resistive processing units.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take a quantity of training epochs containing batches of input data taught in Gokmen '243 in view of Mern and select and perform a fixed number of those training epochs and batches of input data as taught in Gokmen '642 to define a stopping condition to reduce a classification   
Regarding original Claim 19, Gokmen '243 in view of Mern, in further view of Gokmen '642 teaches
(Original) The method of claim 18, further comprising: 
selecting the target quantity of iterations to reduce forward propagation noise of the artificial neural network and reach a target inference accuracy in the result (Examiner’s note: Gokmen ‘243 teaches performing training of a CNN (implemented on a neuron control system with a neuron interface component) by transmitting training epochs consisting of batches of input data through multiple convolutional kernel layers in a forward pass by presenting the input values on input control lines, where the convolutional kernel layers correspond to hidden layers of an artificial neural network, and where the batches of input data represent providing inputs over a target quantity of iterations (Gokmen '243 col.18 lines 26-48; col.7 lines 5-17; col.8 lines 57-60; col.10 line 56-col.11 line 24). As indicated earlier, Gokmen ‘642 further teaches peripheral devices such as op-amps and the RPU crossbar array itself introduces noise, where the noise from a RPU crossbar array that is used for forward propagation of input values corresponds to forward propagation noise (Gokmen '642 [0085]; [0102]). Gokmen ‘642 further teaches performing a selection of mini-batch and epoch training sizes to reduce classification error (interpreted as forward propagation noise, since noise introduced during forward propagation affects both forward propagation and the weight update cycle), where the classification error after a certain number of epochs of input data represents approaching a target classification (inference) accuracy (Gokmen '642 [0085]; [0102]; and [0068]: “Here, the mini-batch size of unity is chosen throughout the following experiments. Training is performed repeatedly for all 60,000 images in the training dataset, and 60,000 images constitutes a single training epoch. Learning rates of η=0.01, 0.005, and 0.0025 for epochs 0-10, 11-20, and 21-30, respectively, are used. The baseline model reaches classification error of 2.0% on the test data in 30 epochs.”).).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121