DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3, 5-10, 13, -14, 17-18, and 20 have been amended. Claims 2, 16, and 19 have been canceled. Claims 1, 3-15, 17-18, and 20 remain pending and have been examined.

Response to Arguments
Applicant’s amendment has overcome the previous rejections under 35 USC §§ 101 and 112, which have been withdrawn accordingly.
Applicant’s arguments, see pp. 14-15, filed 11/5/2021, with respect to the rejection(s) of claim(s) 1, 10, and 18 under 35 USC §§ 102 and 103, respectively, have been fully considered and are persuasive.  Therefore, the rejections have been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of U.S. Patent Application Publication 2010/0076915 by Xu et al. and U.S. Patent Application Publication 2016/0210550 by Merrill et al. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1, 3-5, 7-13, 15, 17-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2012/0166374 by Moussa et al. (“Moussa”) in view of U.S. Patent Application Publication 2010/0076915 by Xu et al. (“Xu”) and U.S. Patent Application Publication 2016/0210550 by Merrill et al. (“Merrill”).

In regard to claim 1, Moussa discloses:
1. Processing circuitry for a deep neural network, the processing circuitry comprising: See Moussa, at least ¶ 0150, e.g. “The system components may be implemented in software, by hardware description languages or by various hardware platforms.”
a central processing unit (CPU); See Moussa, ¶ 0060, e.g. “computer processors.”
a field programmable gate array (FPGA) with … memory; See Moussa, ¶ 0060, e.g. “FPGAs.” Also see ¶ 0071, e.g. “These results allow for the prediction of the impact of a specific arithmetic representation on both convergence speed and FPGA configurable resources, such as for example slices, multipliers, configurable routing and memory used.”
Moussa does not expressly disclose an on-chip static random access memory (SRAM). However, this is taught by Xu.  See Xu, Fig. 1, elements 106 and 124, depicting a system including CPU 132 connected to an FPGA with on-chip memory. Also see ¶ 0157, e.g. “SRAM and RAM/registers within the FPGA.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use 
Moussa does not expressly disclose dynamic random access memory (DRAM) remote to the FPGA and the CPU; However, Xu teaches remote dynamic random access memory. See Xu, Fig. 1 and ¶ 0039, e.g. “The accelerator system 100 may include an acceleration device 101 comprising a Peripheral Component Interface (PCI) board 104 with a Field-Programmable Gate Array (FPGA) 106 and onboard memory 108, which can be any suitable RAM (including SRAM and/or SDRAM) such as DDR, DDR2 or DDR3, and so forth.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Moussa’s CPU and FPGA with Xu’s SDRAM in order to utilize hierarchical memory as suggested by Xu and known in the art to provide efficient resource utilization (see Xu, ¶ 0157).
input/output ports; and See Moussa, ¶ 0032, e.g. “input port” and “output port.” 
a plurality of neural network neurons organized into layers and implemented by the FPGA, each layer including at least one neuron, the layers organized from a first layer to a second, hidden layer, to a third, last layer, at least one neuron from the first layer is coupled to one or more of the input ports and to at least one neuron of at least one higher numbered layer and the output ports, at least one neuron from the second layer is coupled to at least one of the input ports and the at least one neuron of the first layer and the at least one neuron of at least one higher numbered layer and output ports, and at least one neuron of the third layer is coupled to the input ports and at least one neuron of at least one lower-numbered layer and to the output ports, See Moussa, ¶ 
each of the plurality of neural network neurons including a weighted computational unit implemented by the CPU  to interleave forward propagation of computational unit input values from the first layer to the last layer and backward propagation of output error values from the last layer to the first layer. See Moussa, ¶ 0083-0084, e.g. “Error Back-propagation Computation: … Starting with the output layer, and moving back toward the input layer.” Also see Fig. 6 along with ¶ 0134, e.g. “The back propagation stream 190 of weight updates propagates at the same time as the stream of new patterns through the feed forward stage 185.” Also see Fig. 14 along with ¶ 0192 and 0194, e.g. “The neuron input 230 provides the synthesized inputs 300 to weighted sum module 310 which implements Equation (1) and determines the weighted input sum for the neuron. … At the neuron 215 level, the error back propagation stage is composed of a weight change module 325 and a weight update module 330 in connection with the weight memory 315, as explained above.”
wherein interleaving forward propagation and backward propagation includes, retrieving for a forward propagation and from the DRAM, one or more weight values associated with at least one neuron of the last layer and storing the retrieved one or more weight values in the SRAM, and while a weight value associated with the last layer is still in the SRAM from the forward propagation, backward propagating an output error value from an output of the last layer to an input of the last layer using the weight value, providing a result of the backward propagating to the input/output ports; and. See Moussa, ¶ 0017, e.g. “each neuron includes: a weight memory for storing weights associated with the neuron.” See Fig. 14, element 315 “weight memory.” Also see ¶ 0076, e.g. “The feed forward computation step uses internal weights (not shown) associated with each neuron 105 for calculating the neuron's 105 output. The error BP computation step compares the network's 100 overall output 125 to a target (not shown), computes an error gradient, and propagates the error through layers 110 by adjusting the neuron 105 weights to correct for it.” Also see ¶ 0196, e.g. “That is, the weight change module 325 reads the currents weights from the weight memory 315 and updates them using the received weight changes. These updated weights are then stored in the weight memory 315 for recall.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the Moussa’s FPGA and weights with Xu’s DRAM and SRAM in order to enhance access speeds as suggested by Xu (see ¶ 0159). 
Moussa does not expressly disclose: overwriting the one or more weight values stored in the SRAM associated with the at least one neuron with one or more values associated with another neuron after backward propagating the output error value. However, this is taught by Merrill. See Merrill, ¶ 0065, e.g. “a plurality of FPGAs, which may be reconfigured for each neural network, or layer of neural network.” Note that 

In regard to claim 3, Moussa discloses:
3. The processing circuitry of claim 1, wherein each of the weighted computational units includes: 
circuitry to multiply a plurality of computational unit input values by corresponding weight values of the weight values to produce a plurality of weighted input values, the plurality of computational unit input values received from an input port of the input ports and/or at least one neuron of the lower-numbered layers; See Moussa, Fig. 14 and ¶ 0192 along with Eq. 1 in ¶ 0079, depicting multiplication of input values by weight values.
circuitry to perform a computational function on the plurality of weighted input values to produce a plurality of computational function results; See Fig. 14, element 320 along with ¶ 0193.
circuitry to transmit the plurality of computational function results to at least one neuron of a higher-numbered layer and/or at least one of the output ports; See Fig. 14, element 325.
circuitry to receive a plurality of error values from at least one of the output ports and/or a higher-numbered layer, each of the plurality of error values corresponding to a different weight value of the respective weighted computational unit; and circuitry to backpropagate the plurality of error values to at least one neuron of the lower-numbered layer when the respective weighted computational unit is not in the first layer or to the input/output ports when the respective weighted computational unit is in the first layer. See Fig. 14 along with ¶ 0195, e.g. “The weight change module 325 generally performs the calculation of Equation (5) to determine the weight changes for the neuron 215, using the output of the previous layer (or this neuron's input 230) and the local gradient provided in the back propagation data path from its layer's error back propagation module 305.” Note that the alternative language of the claims allows for a broad but reasonable interpretation.

In regard to claim 4, Moussa also discloses:
4. The processing circuitry of claim 3, wherein the circuitry to backpropagate the plurality of error values includes: 
circuitry to multiply the plurality of error values by the corresponding weight values of the respective weighted computational unit to produce a plurality of backpropagating results; and See Moussa, Eq. 3 at ¶ 0084, depicting multiplication of an error gradient with weight values.
circuitry to transmit the plurality of backpropagating results to a corresponding weighted computational unit of the lower-numbered layer when the respective weighted computational unit is not in the first layer or to the input/output ports when the respective weighted computational unit is in the first layer. See Moussa, Fig. 14, generally depicting transmission of backpropagation results. Note that the alternative language of the claims allows for a broad but reasonable interpretation.

In regard to claim 5, Moussa does not expressly disclose: 
5. The processing circuitry of claim 4, wherein: the circuitry to transmit the plurality of computational function results to a higher-numbered layer for the respective weighted computational unit includes circuitry to write the plurality of results to the SRAM; the one or more computational unit input values are received from a lower-numbered layer of the plurality of neural network layers when the respective weighted computational unit is not in the first layer by circuitry to read the computational unit input values from the SRAM; the circuitry to transmit the plurality of backpropagating results to a corresponding weighted computational unit of the lower-numbered layer when the respective weighted computational unit is not in the first layer includes circuitry to write the plurality of backpropagating results to the SRAM; and the circuitry to receive the plurality of error values from the higher-numbered layer when the respective weighted computational unit is not in the last layer includes circuitry to read the plurality of error values from the SRAM. However, Xu teaches the use of local memory to store values related to neural networks. See Xu, ¶ 0159, e.g. “Temporary data structures, such as intermediate variables, parameters, and so forth, and results, e.g., the learned model, could be stored in the onboard memory (such as the onboard 

In regard to claim 7, Moussa also discloses:
7. The processing circuitry of claim 1, further comprising: 
circuitry to compute updated weight values for each of the weighted computational units according to the backpropagated output error values; and circuitry to transmit the updated weight values to the plurality of weighted computational units. See ¶ 0083, e.g. “weights … are updated …”

In regard to claim 8, Moussa also discloses:
8. The processing circuitry of claim 1, wherein the circuitry to interleave forward propagation of computational unit input values from the first layer to the last layer and backpropagation of the output error values from the last layer to the first layer performs the backpropagation when weight values for the respective weighted computational unit are in active memory of the respective weighted computational unit, the weight values used by circuitry to perform computations by the weighted computational units during the interleaved forward propagation and the backward propagation. See Moussa, at 

In regard to claim 9, Moussa also discloses:
9. The processing circuitry of claim 1, wherein the circuitry to interleave forward propagation of computational unit input values from the first layer to the last layer and backpropagation of the output error values from the last layer to the first layer includes: 
circuitry to multiply the backward propagated error values by corresponding weight values of the respective weighted computational units to produce backpropagating multiplication results; and See Eq. 3 in ¶ 0084 depicting multiplication of backpropagation results.
circuitry to transmit the backpropagating multiplication results to connected one or more weighted computational units of a preceding layer in an order from the first layer to the last layer for respective weighted computational units that are not in the first layer or transmit the backpropagating multiplication results to the input/output port for respective weighted computational units that are in the first layer. Note that this limitation is broadly interpreted according to the alternative “or” of the claim language. See Moussa, Figs. 9 and 14, depicting backpropagation of results to neurons at each layer. 


10. A method for performing interleaved forward propagation and backward propagation for a deep neural network (DNN) implemented in batches on a field programmable gate array (FPGA), the DNN comprising a plurality of neural network layers coupled in order from a first layer to a last layer, each of the plurality of neural network layers including a plurality of weighted computational units, and input/output ports providing input to and receiving output from the plurality of neural network layers, the method comprising: See Moussa, Fig. 8, depicting a method. Also see Moussa, ¶ 0060, e.g. “FPGAs.” Also see Moussa, Fig. 1, depicting a deep neural network.  Also see ¶ 0099, e.g. “MLP training can be conducted using either a per-pattern or epoch (a.k.a. batch) training method.” Also see ¶ 0071, e.g. “These results allow for the prediction of the impact of a specific arithmetic representation on both convergence speed and FPGA configurable resources, such as for example slices, multipliers, configurable routing and memory used.”
Also see Fig. 1, and ¶ 0075, e.g. “Referring now to FIG. 1 there is shown an example diagram of a MLP-BP network 100 containing neurons 105 numbered 1 to N structured in a plurality of parallel layers 110 numbered 0 to M. The layers 110 include an input layer 110 (layer 0), hidden layer(s) 110 (layer(s) 1 to M-1), and an output layer 110 (layer M).” Also see Fig. 14, depicting neurons with inputs, outputs, and weighted computation units.
retrieving for the forward propagation …, respective weight values associated with the last layer; See Moussa, Fig. 14, element 315, depicting retrieval of weight and from a dynamic random access memory (DRAM) remote to the FPGA. However, Xu teaches remote dynamic random access memory. See Xu, Fig. 1 and ¶ 0039, e.g. “The accelerator system 100 may include an acceleration device 101 comprising a Peripheral Component Interface (PCI) board 104 with a Field-Programmable Gate Array (FPGA) 106 and onboard memory 108, which can be any suitable RAM (including SRAM and/or SDRAM) such as DDR, DDR2 or DDR3, and so forth.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Moussa’s CPU and FPGA with Xu’s SDRAM in order to utilize hierarchical memory as suggested by Xu and known in the art to provide efficient resource utilization (see Xu, ¶ 0157).
storing the retrieved weight values in a … [memory]; See Fig. 14, element 315.
Moussa does not expressly disclose static random access memory (SRAM) on a same board as the FPGA However, this is taught by Xu.  See Xu, Fig. 1, elements 106 and 124, depicting a system including CPU 132 connected to an FPGA with on-chip memory. Also see ¶ 0157, e.g. “SRAM and RAM/registers within the FPGA.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Moussa’s FPGA with Xu’s SRAM in order to increase memory access bandwidth and data locality as suggested by Xu (see ¶ 0041).
while a weight of the respective weight values associated with the last layer is still in the SRAM from the forward propagation, backward propagating an output error value from an output of the last layer to an input of the last layer; providing a result of the backward propagating to the input/output ports; and See Moussa, ¶ 0017, e.g. “each neuron includes: a weight memory for storing weights associated with the neuron.” See Fig. 14, element 315 “weight memory.” Also see ¶ 0076, e.g. “The feed forward computation step uses internal weights (not shown) associated with each neuron 105 for calculating the neuron's 105 output. The error BP computation step compares the network's 100 overall output 125 to a target (not shown), computes an error gradient, and propagates the error through layers 110 by adjusting the neuron 105 weights to correct for it.” Also see ¶ 0196, e.g. “That is, the weight change module 325 reads the currents weights from the weight memory 315 and updates them using the received weight changes. These updated weights are then stored in the weight memory 315 for recall.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the Moussa’s FPGA and weights with Xu’s DRAM and SRAM in order to enhance access speeds as suggested by Xu (see ¶ 0159). 
Moussa does not expressly disclose: overwriting the one or more weight values stored in the SRAM associated with the at least one neuron with one or more values associated with another neuron after backward propagating the output error value. However, this is taught by Merrill. See Merrill, ¶ 0065, e.g. “a plurality of FPGAs, which may be reconfigured for each neural network, or layer of neural network.” Note that reconfiguration with a new neural network layer includes association of the FPGA with new neurons. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Xu’s weight storage with Merrill’s 

In regard to claim 11, parent claim 10 is incorporated. All further limitations have been addressed in the above rejection of claims 3 and 9.

In regard to claim 12, parent claim 11 is addressed above. All further limitations have been addressed in the above rejection of claim 4. 

In regard to claims 13, 15, and 17, parent claims 10 and 12 are addressed above. All further limitations have been addressed in the above rejections of claims5, 7, and 9, respectively. 

In regard to claim 18, Moussa discloses:
18. At least one non-transitory machine-readable medium including instructions that, when executed by one or more processors, See Moussa, at least ¶ 0020, e.g. “computer readable code on a physical computer readable media that may be executed by a computing device.”
configure processing circuitry of a field programmable gate array (FPGA) to implement a deep neural network (DNN) in batches, the DNN comprising a plurality of neural network layers coupled in order from a first layer to a last layer, See Moussa, Fig. 1, depicting a deep neural network. Also see Moussa, ¶ 0060, e.g. “FPGAs.” Also see ¶ 
each of the plurality of neural network layers including a plurality of weighted computational units, and input/output ports providing input to and receiving output from the plurality of neural network layers, See Fig. 1, depicting layers of neurons. Also see Fig. 14, depicting neurons including weighted computation units 310 and inputs and outputs.
wherein the one or more processors configure the processing circuitry to: interleave forward propagation of computational unit input values from the first layer to the last layer and backpropagation of output error values from the last layer to the first layer. See Moussa, Figs. 6 and 14 along with ¶ 0083-0084, 0134, 0192, and 0194 as cited above in the rejection of claim 1.
All further limitations of claim 18 have been addressed in the above rejection of claim 1. 

In regard to claim 20, parent claim 19 is addressed above. All further limitations have been addressed in the above rejections of claims 3-4.

Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Moussa in view of Xu and Merrill as applied above, and further in view of U.S. Patent Application Publication 2017/0301063 by Merhav et al. (“Merhav”).

In regard to claim 6, Moussa also discloses:
6. The processing circuitry of claim 4, further comprising: 
circuitry to subtract a corresponding predefined desired result from each of the plurality of computational function results transmitted to the input/output ports by [a] weighted computational unit in the last layer to determine the plurality of error values for the [weighted] computational unit in the last layer; See Moussa, Eq. 3 in ¶ 0084.
circuitry to transmit the plurality of error values to the processing circuitry for the weighted computational unit in the last layer; and See Moussa, Fig. 14, element 325 and 330 along with ¶ 0190.
for each of the weighted computational units: 
circuitry to multiply the plurality of transmitted backpropagating results by a multiplication factor to determine a plurality of multiplying results; and See Eq. 3 in ¶ 0084 depicting multiplication of backpropagation results.
Moussa does not expressly disclose: circuitry to subtract the plurality of multiplying results from the corresponding weights for the respective weighted computational unit to determine updated weights for the respective weighted computational unit. However, this is taught by Merhav. See Merhav, ¶ 0058, e.g. “a ratio of the gradient is subtracted from the weight.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Moussa’s backpropagation with Merhav’s weight calculation in order to influence speed and quality of learning, as suggested by Merhav.

In regard to claim 14, parent claim 12 is addressed above. All further limitations have been addressed in the above rejection of claim 6. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
"RRANN: a hardware implementation of the backpropagation algorithm using reconfigurable FPGAs" by Eldredge et al. teaches FPGA implementation of simultaneous feed forward and backpropagation (see section 4 on p. 2100).
U.S. Patent Application Publication 2016/0267380 by Gemello et al. See ¶ 0042, e.g. “As the activation computations in the feed-forward direction and the error computations in the backpropagation are out of sync, queues of activations from the feed-forward direction may be kept to compute the weight variations with the corresponding activations and errors. The activation queues can be used to compensate for the delay introduced by the resulting lack of synchronization of the weights used in the forward and backward propagation caused by the delayed network weight updates introduced by the pipeline.”
U.S. Patent Application Publication 2017/0039472 by Kudo. See ¶ 0044, e.g. “By storing learning weight in the (local) non-volatile memory serving as the second memory 230, the performance time of the multilayer neural network task processing can be reduced in comparison with the case where the learning weight is loaded from an external storage every time.”

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703.  The examiner can normally be reached on M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/James D. Rutten/Primary Examiner, Art Unit 2121