DETAILED ACTION


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 10-16 are rejected under 35 U.S.C. 103 as being unpatentable over Pescianschi (US 20160283842 A1) in view of Koswatta et al. (US 20190180174 A1).

Regarding claim 1, Pescianschi teaches a method for training a residual neural network (Pescianschi, Fig. 2 and Par. 251, neural network P-net 100 (i.e. residual neural network) executed on one or more processing units (Pescianschi, Par. 3, a microprocessor), the residual neural network comprising a plurality of warp units  connected in series (Pescianschi, Par. 133, desired output images 126 “at a layer” (i.e. a warp unit) is used for (i.e. is connected in series to) a subsequent layer (i.e. a subsequent/next warp unit) of a multi-layer p-net & Note: for more evidence on “plurality of warp units connected in series” see Fig. 3 of Koswatta), each warp unit comprising an input (Pescianschi, Fig. 2 and Par. 73, input 104), an output (Pescianschi, Fig. 2 and Par. 77, output 126), a plurality of independent residual units connected in parallel from the input to the output (Pescianschi, Figs. 2, 11 and Pars. 251-253, a plurality of electrical devices function as a plurality of corrective weights 112 (i.e. residual units) & Note: for more evidence on “plurality of independent residual units connected in parallel from the input to the output” see Fig. 3 of Koswatta), a direct connection from the input to the output (Pescianschi, Fig. 2, neuron 116), and at least one derivative unit connected from the input to the output in parallel to the residual units (Pescianschi, Fig. 2 and Par. 122, weight correction calculator 122), each residual unit comprising one or more weights (Pescianschi, Fig. 2 and Pars. 75, 79, each corrective weight 112 is defined by a respective weight value), the method comprising: initializing the one or more weights of each residual unit (Pescianschi, Fig. 2 and Par. 76, initial value of the corrective weight 112); inputting a plurality of training cases to the first warp unit in the series (Pescianschi, Fig. 23, and Pars. 88, 98-99, return to step 202 to perform additional training or training generally begins with formation of a set of training images (i.e. training cases)); using each training case to optimize the one or more weights for each residual unit in parallel in the first warp unit in the series (Pescianschi, Fig. 10 and Pars. 77, 98, modify (i.e. optimize) respective corrective weight values to minimize the deviation of the neuron sum 120 from the value of the desired output signal); starting with the output of the first warp unit in the series (Pescianschi, Fig. 23 and Par. 101-102, desire value/image 126 is output), iteratively propagating the output of each warp unit to the input of the next respective warp unit in the series (Pescianschi, Fig. 2 and Par. 133, desired output images 126 is used for a subsequent layer of a multi-layer p-net), 
storing the output of the last warp unit in the series (Pescianschi, Pars. 169-172, value of desired neuron sum 120 is stored) store the weights for each residual unit (Pescianschi, Par. 254, each corrective weight 112 (i.e. residual units) is defined by the memory element 150 that retains/stores the respective weight value). 
However, Pesciansch does not specifically mention each respective warp unit, using each training case to optimize the one or more weights for each residual unit in parallel in the respective warp unit.
Koswatta teach the neural network (Koswatta, Figs. 3-5) comprising: for each of hidden layer and output layer or layer nodes (i.e. warp units), apply a weight update by adjust (i.e. optimize) relevant parameter to each weight (i.e. residual unit) in each layer (Koswatta, Figs. 3-5 and Pars. 49-50, 59-61).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the above teaching as taught by Koswatta into Pesciansch to effectively update weights/parameters in each layer.

Regarding claim 2, the combination of Pesciansch and Koswatta teaches previous claim.  The combination further teaches the method of claim 1, wherein the at least one derivative unit is determined using at least one multiplication of one of the residual units to a derivative of another one of the residual units (Y8 = F(M13*Y4) + …),  which M13 (i.e. one of the residual units) multiplied by Y4 include M1 (i.e. another one of the residual units) (Koswatta, Fig.3), which error value calculated (i.e. derivative) is used to adjust the weight M1 (Koswatta, Figs. 3, 5 and Pars. 49-50, 59-61).

Regarding claim 3, the combination of Pesciansch and Koswatta teaches previous claim.  The combination further teaches the method of claim 1, wherein the at least one derivative unit is determined as a derivative of at least one of the residual units (error value calculated (i.e. derivative) is used to adjust the weight M1 (Koswatta, Figs. 3, 5 and Pars. 49-50, 59-61).

Regarding claim 4, the combination of Pesciansch and Koswatta teaches previous claim.  The combination further teaches the method of claim 1, wherein the at least one derivative unit is determined as a derivative of at least one of the residual units multiplied by the input (Koswatta, Fig. 3, y4=f(ml*y1+ …), which  weight M1 (i.e. one of the residual units) multiplied by y1 (i.e. input) and Pars. 49-50).  

Regarding claim 10, the combination of Pesciansch and Koswatta teaches previous claim.  The combination further teaches the method of claim 1, wherein at least two of the warp units comprising a differing quantity of residual units from each other (Pescianschi, Par. 3, weights M1-20 (i.e. residual units), Y4 and Y8 (i.e. two of the warp units)).

Regarding claim 11, Pescianschi teaches a system for training a residual neural network (Pescianschi, Fig. 2 and Par. 251, neural network P-net 100 (i.e. residual neural network), the system comprising one or more processors (Pescianschi, Par. 3, a microprocessor) and one or more non-transitory computer storage media, the one or more non- transitory computer storage media causing the one or more processors to execute the residual neural network comprising: a plurality of warp units comprising an input and an output and a direct connection from the input to the output (Pescianschi, Par. 133, desired output images 126 “at a layer” (i.e. a warp unit) is used for (i.e. is connected in series to) a subsequent layer (i.e. a subsequent/next warp unit) of a multi-layer p-net & Note: for more evidence on “plurality of warp units connected in series” see Fig. 3 of Koswatta); (Pescianschi, Figs. 2, 11 and Pars. 251-253, a plurality of electrical devices function as a plurality of corrective weights 112 (i.e. residual units) & Note: for more evidence on “plurality of independent residual units connected in parallel from the input to the output” see Fig. 3 of Koswatta), each residual unit comprising one or more weights (Pescianschi, Fig. 2 and Pars. 75, 79, each corrective weight 112 is defined by a respective weight value); for each of the warp units, at least one derivative unit connected in parallel to the residual units (Pescianschi, Fig. 2 and Par. 122, weight correction calculator 122); and a warp operator to receive the inputs and outputs from each of the warp units (Pescianschi, Fig. 2, neuron 116 (i.e. warp operator), input 106, output 126 and Par. 153) and to train the residual neural network (Pescianschi, Fig. 23, and Pars. 169-171), the training comprising optimizing the weights of each residual unit (Pescianschi, Fig. 10 and Pars. 77, 98, modify (i.e. optimize) respective corrective weight values to minimize the deviation of the neuron sum 120 from the value of the desired output signal) based on a plurality of training cases (Pescianschi, Fig. 23, and Pars. 88, 98-99, return to step 202 to perform additional training or training generally begins with formation of a set of training images (i.e. training cases)).  
However, Pesciansch does not specifically mention for each of the warp units, a plurality of residual units connected in parallel.
Koswatta teach the neural network (Koswatta, Figs. 3-5) comprising: a plurality of weights for each of hidden layer and output layer or layer nodes (i.e. warp units) (Koswatta, Figs. 3-5 and Pars. 49-50, 59-61).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the above teaching as taught by Koswatta into Pesciansch to effectively update weights/parameters in each layer.

Regarding claim 12, the combination of Pesciansch and Koswatta teaches previous claim.  The combination further teaches the system of claim 11, wherein the weights for each residual unit in a warp unit is determined on a separate processing unit (Koswatta, Fig. 4 and Par. 56).

Regarding claim 13, apparatus of claim 13 is performed by the method of claim 1.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 1 (method) for the apparatus of claim 13.

Regarding claim 14, apparatus of claim 14 is performed by the method of claim 2.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 2 (method) for the apparatus of claim 14.

Regarding claim 15, apparatus of claim 15 is performed by the method of claim 3.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 3 (method) for the apparatus of claim 15.

Regarding claim 16, apparatus of claim 16 is performed by the method of claim 4.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 4 (method) for the apparatus of claim 16.


Claims 5-9 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Pescianschi US 20160283842 A1) in view of Koswatta et al. (US 20190180174 A1) and in further view of Jacobsen et al. (US 20180225550 A1).

Regarding claim 5, the combination of Pesciansch and Koswatta teaches previous claim.
However, the combination does not teach claim 5.
Jacobsen teaches the method of claim 1, wherein the output is determined from the input using first-order Taylor Series Expansion of the residual units (Jacobsen, Pars. 73-74) and the at least one derivative unit (Jacobsen, Pars. 83-83).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the above teaching as taught by Jacobsen into the combination of Pesciansch and Koswatta to reduce computation while still achieve recognition accuracy.

Regarding claim 6, the combination of Pesciansch and Koswatta teaches previous claim. The combination further teaches the method of claim 1, further comprising performing back propagation comprising propagating a weight gradient (Koswatta, Figs. 3, 5 and Pars. 60-61, voltage/parameter/weight (i.e. weight gradient) of output neuron 508 (i.e. last warp unit), neurons 502 (i.e. first warp unit)).
However, the combination does not specifically mention “voltage/parameter/weight (i.e. weight gradient)” as taught above is “of a loss function”.
Jacobsen teaches compute the derivative of the loss function respect to the parameters α (Jacobsen, Pars. 82-83).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the above teaching as taught by Jacobsen into the combination of Pesciansch and Koswatta to compute error.

Regarding claim 7, the combination of Pesciansch, Koswatta, and Jacobsen teach previous claim.  The combination further teaches the method of claim 6, wherein the weight gradient for each warp unit is determined as a gradient of the output of the warp unit multiplied by a sum comprising the gradients of the residual units (Koswatta, Fig.3, Y8 = F(M13*Y4) + …), which M13 (i.e. gradient of the output of the warp unit) multiplied by Y4 (i.e. sum comprising the gradients of the residual units) and Pars. 49-50, 59-61) and gradients of the at least one derivative unit (Koswatta, Figs.3, 5 and Pars. 49-50, 59-61, error value (i.e. gradients of the at least one derivative unit) is used to adjust the weight (M1).

Regarding claim 8, the combination of Pesciansch, Koswatta, and Jacobsen teach previous claim.  The combination further teaches the method of claim 7, wherein the gradient of each of the at least one derivative units comprising at least one multiplication of the gradient of one of the residual units to the gradient of another one of the residual units (Koswatta, Fig.3, Y8 = F(M13*Y4) + …), which M13 (i.e. gradient of one of the residual units) multiplied by Y4 include M1  i.e. gradient of another one of the residual units) and Pars. 49-50, 59-61).  

Regarding claim 9, the combination of Pesciansch, Koswatta, and Jacobsen teach previous claim.  The combination further teaches the method of claim 8, wherein the sum further comprises the identity matrix (Koswatta, Par. 98).  

Regarding claim 17, apparatus of claim 17 is performed by the method of claim 15.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 5 (method) for the apparatus of claim 17.

Regarding claim 18, apparatus of claim 18 is performed by the method of claim 6.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 6 (method) for the apparatus of claim 18.

Regarding claim 19, apparatus of claim 19 is performed by the method of claim 7.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 7 (method) for the apparatus of claim 19.

Regarding claim 20, apparatus of claim 20 is performed by the method of claim 8.  They recite same scope of limitations.  Applicant is kindly advised to refer to rejection of claim 8 (method) for the apparatus of claim 20.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:

Vogels et al. US 20180293711 A1
[0065] FIG. 1 illustrates an example of a multilayer perceptron (MLP). As described above generally for neural networks, the MLP can include an input layer, one or more hidden layers, and an output layer. In some examples, adjacent layers in the MLP can be fully connected to one another. For example, each node in a first layer can be connected to each node in a second layer when the second layer is adjacent to the first layer. The MLP can be a feedforward neural network, meaning that data moves from the input layer to the one or more hidden layers to the output layer when receiving new data.

Rouhani et al. US 20210295166 A1
Fig. 3A, [0058] In some example embodiments, the machine learning controller 130 may be configured to coordinate the operations of a global machine learning model (e.g., the global neural network 150 implemented by the global machine learning engine 110) and a corresponding plurality of local machine learning models (e.g., the first local neural network 162, the second local neural network 164, the third local neural network 166, and/or the fourth local neural network 168 implemented by the local machine learning engine 145). For example, the first local neural network 162, the second local neural network 164, the third local neural network 166, and/or the fourth local neural network 168 may be trained independently. Training a local neural network may include processing or forward propagating, through the local neural network, training data that includes ground-truth labeled data. Errors (e.g., deviation from the ground truth associated with the training data) present in the output of the local neural network may be backpropagated through the local neural network. Moreover, parameters (e.g., weights, biases, and/or the like) applied by the local neural network (e.g., in processing the training data) may be adjusted (e.g., through gradient descent) in order to minimize the error present in the output of the local neural network. The machine learning controller 130 may propagate these changes by at least making corresponding changes to the same parameters at the global neural network 150.
[0043] The BP algorithm mainly includes two processes of the forward propagation of the signal of learning process and the backward propagation of the error. During the forward propagation, the sample enters from the input layer, and after processed by the activation function of the hidden layer, the results are propagated to the output layer; if the error between the actual output of the output layer and the desired output of the output layer does not meet the error requirements, the backward propagation stage of the error starts. Backward propagation is to propagate the error layer by layer back to the input layer through the hidden layer, and distribute the error to all nodes in each layer, thereby obtaining the error signals of all nodes in each layer. These error signals are used as the basis for correction. The forward propagation of the signal and the backward propagation of the error are carried out in cycles, and the weight is constantly adjusted, which is the process of network learning. This process is continued until the error of the network output is reduced to an acceptable level or until a preset number of learning times is reached.

Hekmatshoartabar et al. (US 20190147329 A1)

Gouding et al. US 20190147342 A1

Chen et al. US 20190230913 A1 

Das et al. US 20190236445 A1

Obradovic et al. US 20190012593 A1

Abhulimen et al. US 20120317058 A1

Lee et al. US 20180190268 A1

Gredilla US 20180082179 A1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CINDY HUYEN TRANDAI whose telephone number is (571)270-1914. The examiner can normally be reached Monday-Thurs 9AM-6:30PM and Friday 8AM-Noon.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wesley L. Kim can be reached on 571-272-7867. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Cindy Trandai/Primary Examiner, Art Unit 2648                                                                                                                                                                                                        7/19/2022