DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.

Response to Amendment
Applicant’s amendment has obviated most, but not all, of the objections to the specifications, drawings, and claims previously given.  To the extent that an objection or rejection appears in the previous Office Action(s) but not this Office Action, that objection or rejection is withdrawn.  To the extent that is appears both in a previous Office Action(s) and this Office Action, the objection or rejection is maintained.
Applicant’s amendment has also obviated the outstanding rejections under 35 USC § 112(b).  Therefore, those rejections are withdrawn.  Applicant’s amendment changing the nonce term “units” to “nodes” also eliminates the means-plus-function interpretation of the claims that contained the term “units,” as a “node,” in the neural network art, generally denotes either a neuron in a neural network or a hardware processor for implementing it.  Therefore, the interpretation of the claims under 35 USC § 112(f) is withdrawn.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on December 1, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to because (a) in Fig. 8B, reference character 860, “back propagation to” should be “back propagation on”; and (b) in Fig. 11H, reference character 1258, “Feedback” should be .  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
In paragraph 210, “data … is illustrated” (penultimate sentence) should be “data … are illustrated”.
In paragraph 217, “engine, however, this” should be “engine; however, this”.
In paragraph 239, “less processing nodes” should be “fewer processing nodes”.
In paragraph 243, the symbol representing “series 4 data packets” (circle with double wavy line) should be inserted for consistency with the remainder of the paragraph.  
In paragraph 252, “less cross over data transfers” should be “fewer cross over data transfers”; “cross overs and beneficially” should be “cross overs, beneficially”.
In paragraph 268, “Fig. 15” should be “Fig. 14” because only Fig. 14 has the reference characters 2000 and 2002 recited in the paragraph.
In paragraph 293, “engine is comprised of” should be “engine comprises”.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
s 1, 7, and 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 5226092) (“Chen”) in view of Torng et al. (US 20180285713) (“Torng”) and further in view of Pechanek et al. (US 5509106) (“Pechanek”).
Regarding claim 1, Chen discloses “[a] device for performing training of a neural network via backward propagation, the device comprising: 
a memory configured to store input values and corresponding expected output values (Chen claim 1 discloses that an artificial neural network is configured to accept an input pattern comprising a first data set stored in memory and also accepts a desired [expected] output pattern comprising a third data set stored in the memory); 
at least one … memory configured to store weights corresponding to connections between nodes of the neural network (Chen claim 1 discloses that the set of interconnection weight values are stored in a memory coupled to the computer as a data set W); and 
a plurality of processing nodes that performs computations according to particular nodes of the neural network (Chen claim 1 discloses a method for adjusting a set of interconnection weight values between processing units [nodes] in an ANN, wherein each processing unit is configured to accept a plurality of inputs and provide an output according to a pre-determined activation function [application of activation function = computation]), …;
 wherein each processing node of the plurality of processing nodes is configured to perform computations for forward propagation of the input values through layers of the neural network to generate predicted output values (Chen claim 1 discloses that each processing unit is arranged to accept a plurality of inputs and provide an output according to a pre-determined activation function and that the ANN is arranged to accept, in an input layer, an input pattern; process that input pattern in a forward pass [propagation] of the ANN according to the pre-determined activation functions; and map the input pattern to an actual output pattern [predicted output values]) by at least:  
receiving input data (Chen claim 1 discloses that each processing unit is arranged to accept a plurality of inputs); 
Chen claim 1 discloses that each processing unit is arranged to accept a plurality of inputs and provide an output according to a predetermined activation function and that the ANN is arranged to accept the input and process it in a forward pass of the ANN according to the pre-determined activation function, the set of weight values, and corresponding interconnections); and 
generating … output data including the activation output and a tag identifying the particular processing node (Chen claim 1 discloses that the ANN processes the input pattern in a forward pass according to pre-determined activation functions [thereby generating an activation output]; Fig. 6B and col. 11, ll. 15-31 disclose that a unit data table contains a unit identifier column that identifies [tags] all units comprising the network and a layer identification column that shows the layer to which each unit belongs [which would be generated by the system prior to the running of the network]; col. 18, l. 50-col. 19, l. 13 disclose that whether an output value is calculated by the processing unit depends on a determination by the processing unit of whether the unit is a second layer unit by checking the layer identification slot); and 
wherein the plurality of processing nodes is configured to perform computations for backward propagation of differences between the expected output values and corresponding ones of the predicted outputs at least partly based on the tags of the particular processing nodes (Chen col. 6, ll. 27-42 disclose that gradient descent-type weight adjustments have typically occurred via backpropagation, which comprises taking, for a given training input pattern, the collective error found by comparing the actual output pattern with a desired output pattern and propagating that error back through the neural network; Fig. 6B, claim 1, and col. 11, ll. 15-31 disclose that these calculations are based on a unit data table that contains a unit identifier column that identifies [tags] all units comprising the neural network and a layer identification column that shows the layer to which each unit belongs; col. 18, l. 50-col. 19, l. 13 discloses that whether the processing unit calculates an output value depends on whether the unit performing the processing is a second layer unit, which is determined by checking the layer identification slot [so the output is based in part on the tags]), wherein the backward propagation updates the weights (Chen, col. 6, ll. 27-42 discloses that backpropagation involves, inter alia, adjusting a weight value of each connection by the delta values found through application of a generalized form delta rule).”  
Chen appears not to disclose explicitly the further limitations of the claim.  However, Torng discloses “at least one additional memory configured to store weights corresponding to connections between nodes of the neural network (Torng claim 1 discloses a memory subsystem coupled to CNN logic circuits comprising a first memory for storing a set of weights that require higher retention rate than input signals and a second memory for storing the input signals [i.e., the memory storing the weights is separate from the memory storing the input signals])….”
Torng and the instant application both relate to physical implementations of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen to add an additional memory specifically to store the weights, as disclosed by Torng, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would separate memories that have different data requirements, thereby improving efficiency.  See Torng, paragraph 47 (one memory is for storing data that do not change and the other is for storing data that change, requiring frequent read/write operations).
Neither Chen nor Torng appears to disclose explicitly the further limitations of the claim.  However, Pechanek discloses “a plurality of processing nodes …, wherein each processing node of the plurality of processing nodes includes a separate circuit (neurons modeled on a neural processor may be simulated in a “direct” and/or a “virtual” implementation; in a direct method, each neuron has a physical processing element (PE) available which may operate simultaneously in parallel with the other neuron PEs active in the system – Pechanek, col. 2, ll. 14-25); … [and]
a triangular neural array processor (T-SNAP) unit for use in a neural network contains a reverse feedback loop for communicating the output of a sigmoid generator back to input multipliers of selected neurons – Pechanek, abstract)….”
Pechanek and the instant application both relate to physical implementations of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chen and Torng to include a separate circuit for each processing node, as disclosed by Pechanek, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve the performance of the network relative to approaches in which multiple neurons are assigned to each PE.  See Pechanek, col. 2, ll. 14-25.

Regarding claim 7, Chen, as modified by Pechanek and Torng, discloses that “the at least one additional memory storing weights comprises a plurality of local memories within corresponding processing nodes (Torng claim 1; Fig. 2B; and paragraphs 10 and 47 disclose CNN processing units each of which comprises logic circuits operatively coupled to two memory subsystems, one storing inputs and the other storing weights [i.e., there is a separate memory storing weights in each processing node]).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chen and Pechanek to include separate memories within the nodes to store weights, as disclosed by Torng, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would separate memories that have different data requirements, thereby improving efficiency.  See Torng, paragraph 47 (one memory is for storing data that do not change and the other is for storing data that change, requiring frequent read/write operations).

Regarding claim 9, Chen, as modified by Torng and Pechanek, discloses that “a particular processing node of the plurality of processing nodes includes an input systolic element configured to receive the input data, processing circuitry configured to perform processing on the received input data to generate the activation output, and an output systolic element configured to systolically output the activation output Chen Fig. 4 and col. 9, ll. 30-60 disclose that a single unit of the processing units receives an input from a source [i.e., has an input systolic element], weights the input according to a preselected weighting function and apply an activation function [i.e., has processing circuitry to perform processing and generate an activation output], and outputs the activation output [i.e., has an output systolic element to output the activation output]).”

Regarding claim 10, Chen, as modified by Torng and Pechanek, discloses that “the plurality of processing nodes is arranged in a plurality of arrangements, wherein each of the plurality of arrangements is configured to perform computations of a corresponding layer of the neural network (Chen claim 1 discloses a method for adjusting interconnection weight values for processing units in an ANN, wherein the ANN comprises interconnected processing units arranged in layers including an input layer [first arrangement], a second layer connected to the input layer [second arrangement], and an output layer [third arrangement], and that each processing unit accepts a plurality of inputs and provides an output according to a pre-determined activation function).”  

Regarding claim 11, Chen, as modified by Torng and Pechanek, discloses that “a particular arrangement of the plurality of arrangements includes a first subset of the plurality of processing nodes, wherein, during the backward propagation, the first subset of the plurality of processing nodes is configured to compute partial derivatives based at least partly on the tags (Chen col. 6, ll. 27-42 discloses that during backpropagation, the weight value of each connection is adjusted by the “delta” value found through application of a generalized form delta rule, which states that the weight change is proportional to the partial derivative of the error with respect to the weights; Fig. 6B and col. 18, l. 50-col. 19, l. 6 disclose that each unit has a unit identifier [tag], each layer has a layer identifier, and an output value is calculated based on whether the unit is a second layer unit, which is determined by checking the layer identifier; col. 14, ll. 14-68 discloses that the delta values computed during backpropagation are stored in a value storage table, which contains a set of all delta values for each unit on a given input pattern [since the delta values are stored on a per-unit basis and the units operate with identifiers/tags, the calculation of the delta values is based on the identifiers/tags]), wherein a weighted sum of the -72-partial derivatives is accumulated as the data are propagated backwards through the particular arrangement (Chen col. 6, l. 56-col. 7, l. 8 (esp. Eq. (11)) disclose that the delta value for a hidden layer is calculated as a weighted sum of the delta values from a previous layer [which are calculated as partial derivatives of the error with respect to the weights]; the backpropagation technique then feeds these error terms back to all of the units that feed the output layer, computing a delta value for each of those units; this propagates the errors back one layer).”

Claims 2-6 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Torng and Pechanek and further in view of Ross et al. (US 9710748) ( “Ross”).
Regarding claim 2, neither Chen, Torng, nor Pechanek appears to disclose explicitly the further limitations of the claim.  However, Ross discloses that “the device includes a systolic processor chip, and … the plurality of processing nodes comprises circuitry embedded in the systolic processor chip (Ross col. 2, ll. 51-65 and col. 4, l. 66-col. 5, l. 8 disclose that a matrix computation unit [systolic processor chip] is a systolic array or other circuitry that can perform mathematical operations and that the circuit can process neural network layers [comprising neurons/processing nodes] that have a number of inputs larger than a size of a dimension of the matrix computation unit).”
Ross and the instant application both relate to hardware implementations of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chen, Torng, and Pechanek to include a systolic processor chip and instantiate the nodes of the network on circuitry in the chip, as disclosed by Ross, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve the efficiency of the processing of the network relative to a network implemented solely in software.  See Ross, col. 2, ll. 51-65.

Ross Fig. 3 and col. 2, ll. 3-24 disclose that the systolic array has a row dimension [arrangement 1] through which weight inputs are shifted and a column dimension [arrangement 2] through which activation inputs are shifted; when the count of activation inputs is greater than a size of the second dimension, the array divides weight inputs into portions; generates a portion of accumulated values; and combines each portion of accumulated values to generate a vector of accumulated values for a given layer [i.e., the row and column arrangements are assigned to subsets of each layer of the neural network while that layer is being processed]).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chen, Torng, and Pechanek to arrange the processing nodes in particular arrangements corresponding to layers, as disclosed by Ross, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve the efficiency of the processing of the network relative to a network implemented solely in software.  See Ross, col. 2, ll. 51-65.

Regarding claim 4, neither Chen, Torng, nor Pechanek appears to disclose explicitly the further limitations of the claim.  However, Ross discloses that “a first arrangement of first processing nodes of the systolic array is configured to systolically pulse values output by the first processing nodes to a second arrangement of the systolic array during the forward propagation (Ross col. 5, l. 66-col. 6, l. 3 discloses that on a given clock cycle, each cell processes a given weight input and activation input to generate an accumulated output, which can be passed [systolically pulsed] to an adjacent cell; col. 2, ll. 2-7 discloses that the weight inputs are shifted through a first plurality of cells [first arrangement] along a first dimension of the systolic array, and the activation inputs are shifted through a second plurality of cells [second arrangement] along a second dimension of the array).”  It would have been obvious to See Ross, col. 2, ll. 51-65.

Regarding claim 5, Chen, as modified by Torng, Pechanek, and Ross, discloses that “the first processing nodes are configured to accumulate partial derivatives based on values received from the second arrangement during the backward propagation (Chen col. 6, l. 56-col. 7, l. 8 (esp. Eq. (11)); col. 6, ll. 27-37; and claim 1 disclose that the delta value for a hidden layer [first processing nodes] is calculated as a weighted sum [accumulation] of the delta values from a previous layer [second arrangement; note that the delta values are calculated as partial derivatives of the error with respect to the weights]; the backpropagation technique then feeds these error terms back to all of the units that feed the output layer, computing a delta value for each of those units; this propagates the errors back one layer).”  

Regarding claim 6, neither Chen, Torng, nor Pechanek appears to disclose explicitly the further limitations of the claim.  However, Ross discloses that “the systolic processor chip is configured to: 
systolically pulse data in a first direction through the plurality of processing nodes during the forward propagation (Ross col. 5, l. 66-col. 6, l. 3 discloses that on a given clock cycle, each cell processes a given weight input and activation input to generate an accumulated output, which can be passed [systolically pulsed, in a first direction] to an adjacent cell [processing node]), and 
systolically pulse data in a second direction through the plurality of processing nodes during the backward propagation, wherein the second direction is opposite the first direction (Ross col. 3, ll. 45-50 discloses that the layers of the neural network can be arranged such that the output of a layer can be sent back as an input to a previous layer; col. 5, l. 66-col. 6, l. 3 discloses that on a given clock cycle, each cell processes a given weight input and activation input to generate an accumulated output, which can be passed [systolically pulsed] to an adjacent cell [i.e., since the forward propagation of the neural network corresponds to passing accumulated outputs from cell to cell in one direction, and since the network be arranged such that the output of a layer can be sent back to a previous layer, it follows that the backward propagation of the network corresponds to passing outputs from cell to cell in the opposite direction]).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chen, Torng, and Pechanek to pulse the data bidirectionally, as disclosed by Ross, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would implement the backpropagation of the neural network in a manner that improves the efficiency of the processing of the network relative to a network implemented solely in software.  See Ross, col. 2, ll. 51-65.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Torng and Pechanek and further in view of Chi et al., “PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory,” in 2016 ACM/IEEE Ann. Int’l Symp. Computer Architecture 27-39 (2016) (“Chi”).
Regarding claim 8, neither Chen, Torng, nor Pechanek appears to disclose explicitly the further limitations of the claim.  However, Chi discloses that “the at least one additional memory storing weights is disposed external to the plurality of processing nodes, and …
the plurality of processing nodes are configured to fetch identified ones of the weights from the at least one additional memory (Chi Figs. 3(c) and 4 and sec. III, first two paragraphs and sec. III(A) disclose a processing-in-memory architecture that partitions a ReRAM bank into memory subarrays, full function subarrays, and buffer subarrays; the memory subarrays have only data storage capability, whereas the FF subarrays have both computation and data storage capabilities and execute NN computation in computation mode [i.e., the arrays for processing are external to the arrays exclusively for memory]; p. 31, first paragraph discloses that the arrays may store positive and negative weights, respectively; p. 32, last paragraph before sec. III(C) discloses that data are loaded into the memory subarray in order to fetch the data [including the weights] for the FF subarrays).”  
Chi and the instant application both relate to physical implementations of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chen, Torng, and Pechanek to place the weight memory outside of the processing node and fetch the weights from the memory, as disclosed by Chi, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance by separating the processor from the memory.  See Chi, abstract.

Claims 12-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Girones et al., “Systolic Implementation of a Pipelined On-Line Backpropagation,” in Proc. Seventh Int’l. Conf. Microelectronics for Neural, Fuzzy, and Bio-Inspired Sys. 387-94 (1999) (“Girones”) in view of Baji et al. (US 5091864) (“Baji”).
Regarding claim 12, Girones teaches “[a] method for training of a neural network via an array of systolic processing units, the method comprising: 
accessing input values and corresponding expected output values for the neural network (Girones Figs. 1-2 and sec. 2.1 disclose the training of the network by applying a pattern to the input layer and propagating the signal forwards, then computing the error between an expected output value ti and the actual output value yi and backpropagating that error throughout the network); 
computing, by the array of systolic processing units, a forward propagation of the input values through layers of the neural network to generate predicted output values (Girones Fig. 2 and sec. 2.1 indicate that the pattern aiK is applied to the input layer and the signal is propagated forward through the network until the final outputs have been calculated), wherein performing the forward propagation includes: 
Girones sec. 2.1 and Fig. 2 disclose that outputs ali are calculated by applying an activation function f to a linear combination of the weights of the layer and the outputs from the previous layer; Fig. 4 and sec. 3.1 disclose that the processing elements of the system perform the sigmoid function of the multiply and accumulate output [thereby generating an activation output]); and 
generating … tagging information representing which of the systolic processing units computed which particular activation output value (Girones sec. 2.1 and Fig. 2 disclose that the input signal is propagated forwards through the network by linearly combining the weights from layer l and the outputs from layer l -- 1 and taking a function of the result [l = tag representing which processing unit/node performed the activation function]); 
computing, by the array of systolic processing units, a backward propagation of differences between the expected output values and corresponding ones of the predicted output values, wherein computing the backward propagation is based at least partly on the tagging information (Girones Fig. 2 and sec. 2.1 disclose that the error between the expected value and the actual value is calculated and the errors are propagated backwards using delta values for layer l – 1 that are a function of the linear combination of weights and inputs for layer l – 1 [so the calculations are based on which layer of the network the system uses, labeled/tagged as l]); and 
updating the weights based on the backward propagation (Girones sec. 2.1-2.2 indicate that the change in each weight in layer l is based on the delta value for layer l computed by backpropagation).”
	Girones appears not to disclose explicitly the further limitations of the claim.  However, Baji discloses “generating, by each of the particular systolic processing units, tagging information (neural net processor comprising a multiplicity of systolic processor elements [systolic processing units] comprises, in each processor element, an address counter that generates an address [tagging information] of the coefficient memory and increments the address in synchronization with non-overlap two-phase clocks – Baji, col. 14, ll. 30-45; see also Figs. 16 (showing the address counter 36 in each processing element), 2 (showing the entire signal processor including multiple processor elements)) ….”
Baji and the instant application both relate to hardware implementations of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Girones to generate tag information at each individual processing unit, as disclosed by Baji, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system is aware from which part of the system each signal is coming.  See Baji, col. 14, ll. 30-45.

Regarding claim 13, Girones, as modified by Baji, discloses that “the systolic processing units are arranged in arrangements, the method further comprising assigning particular arrangements to particular layers of the neural network (Girones Figs. 2 and 4 show that the processing units are arranged in patterns and that the processing units are used to implement layers of the network, each of which has a particular arrangement).” 

Regarding claim 14, Girones, as modified by Baji, discloses that “computing the forward propagation further comprises systolically pulsing activation outputs of a first arrangement of the arrangements to a second arrangement of the arrangements (Girones sec. 2.1 and Fig. 2 disclose pushing the output of layer l – 1 [first arrangement] and linearly combining it with the weights of layer l [second arrangement]; Fig. 4 and secs. 3.1-3.2 disclose that some processing elements perform multiply and accumulate operations and others calculate the sigmoid function and that the synapse units can perform the backward phase of a pattern in one cycle and the forward phase of another pattern in the following cycle without waiting for the end of the backward phase in the whole layer [sending the output of one layer to the next layer = systolically pulsing]).”

Girones Fig. 2 and sec. 2.1 disclose that the errors for the l – 1st layer [first arrangement] are backpropagated, inter alia, by summing a linear combination of the weights and the delta values for layer l [second arrangement]; see also sec. 3.2).”  

Regarding claim 16, Girones, as modified by Baji, discloses that “computing the backward propagation further comprises, by particular systolic processing units of the first arrangement, computing partial derivatives based at least partly on the tagging information (Girones Fig. 2 and sec. 2.1 (particularly Eq. (2)) disclose that the deltas for a previous layer l - 1 are computed by backpropagating the errors using, inter alia, the partial derivative of the activation function with respect to the previous layer’s weight/input linear combination; given that this is done per layer l, the calculation is done based on the tag l corresponding to the layer).”  

Regarding claim 18, Girones, as modified by Baji, discloses that “generating the tagging information comprises, by a first processing unit of the systolic processing units, tagging an activation value output from the first processing unit with an address of the first processing unit (Girones Fig. 2 and sec. 2.1 disclose that the pattern aiK is applied to the input layer and the signal is propagated forwards through the network until the final outputs have been calculated, and that the output ali is an activation function f of the weighted sum of weights and outputs from the previous layer [where l is the number of the layer [tag/address of processing unit]]; see also Fig. 4 and sec. 3.1).”  

Regarding claim 19, Girones, as modified by Baji, discloses that “performing the forward propagation comprises, by a second processing unit of the systolic processing units, identifying one of the weights to use for processing the activation value based on the address of the first processing unit (Girones sec. 2.1 and Fig. 2 disclose applying a pattern aiK to the input layer of the network and propagating it forwards until final outputs have been calculated for each neuron i and each layer l by, inter alia, applying a linear combination of all weights wij and the previous layer’s output al – 1j for all j from 0 to Nl – 1- and then taking an activation function of the result [so each processing unit determines which weights wij to apply to neuron i in layer l by determining the address j of all neurons [first processing units] connected to the neuron i in layer l – 1 and linearly combining the weights corresponding to those connections with the output from those neurons in layer l – 1]).”  

Regarding claim 20, Girones, as modified by Baji, discloses that “performing the backward propagation comprises updating the one of the weights based on the address of the first processing unit (Girones Fig. 2 and sec. 2.1 disclose that the weight between neuron i in layer l and neuron j in layer l – 1 at a time step m is the sum of the weight at time step m – 1 and a change in weight that is dependent on the delta value for layer l and neuron i [i.e., based on an address i of the neuron]).”

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Girones in view of Baji and further in view of Chen.
Regarding claim 17, Girones, as modified by Baji and Chen, discloses “accumulating a weighted sum of the partial derivatives as data are propagated backwards through the first arrangement (Chen col. 6, l. 56-col. 7, l. 8 (esp. Eq. (11)) disclose that the delta value for a hidden layer is calculated as a weighted sum of the delta values from a previous layer [which are calculated as partial derivatives of the error with respect to the weights]; the backpropagation technique then feeds these error terms back to all of the units that feed the output layer, computing a delta value for each of those units; this propagates the errors back one layer).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Girones and Baji to accumulate a weighted sum of the partial derivatives during backpropagation, as disclosed by Chen, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system See Chen, col. 6, l. 56-col. 7, l. 8.

Response to Arguments
Applicant's arguments filed December 1, 2021 (“Remarks”) have been fully considered but they are not persuasive. 
Applicant argues that the processing unit of Chen is a single entity that executes a software neural network, rather than a plurality of processing nodes that each includes a separate circuit, as in amended claim 1.  Remarks at 41.  However, except insofar as the new limitation “each processing node of the plurality of processing nodes includes a separate circuit” delineates that the nodes comprise circuits, nothing in the claim prohibits the “processing nodes” (which are not defined in the specification in any closed-ended way) from being interpreted as ordinary neurons of the network.  In any event, the above limitation is taught by Pechanek, and it would have been obvious to an ordinary artisan to instantiate the network of Chen in the one-circuit-per-neuron hardware of Pechanek for the reason given above, namely performance and efficiency improvements over a pure software implementation.
Applicant then argues that Girones does not disclose that the neurons of its network generate the “l” layer identifier and does not identify which neuron computed a particular activation output, nor that backward propagation is performed based on the layer identifier.  Remarks at 41-42.  However, the argument that the neurons do not generate the tags is moot in light of the addition of Baji to the rejection.  Regarding the argument that the layer identifier does not identify which neuron computed an activation output, it is noted that Equation (1) shows that the activation outputs are computed with respect both to the layer l and with respect to the neuron i within each layer.  It is the combination of these labels that comprises the tag, not the layer identifier alone.  Regarding the argument that the backward propagation is not performed based on the tags in Girones, it is noted that Equation (2) discloses that the delta value for neuron i in layer l – 1 is computed as a function both of the delta values in subsequent layer l and on the multiply-u for layer l – 1 and neuron i.  Therefore, backpropagation is based both on the layer tag l and on the neuron tag i.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7a-5:30p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit 





/R.C.V./             Examiner, Art Unit 2125    

/KAMRAN AFSHAR/             Supervisory Patent Examiner, Art Unit 2125