DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 15 December 2021, in response to the Office Action mailed 28 October 2021.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.

The rejections under 35 U.S.C. 112 have been withdrawn due to the amendments filed.

The objection to claim 18 has been withdrawn due to the amendments filed.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-10, 12, 13, 15, 16, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reagen et al. (Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators, June 2016, pgs. 267-278) in view of Esterline (US 2013/0041859).

As per claim 1, Reagen teaches a method for training a neural network comprising: providing at least one model of the neural network [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware combined with the DNN model (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)], the at least one model being specific to hardware of the neural network [a detailed model of the hardware is used to simulate the hardware accelerator that is to run the NN (sections 3, 5, etc.)]; iteratively training the neural network using the at least one model to provide at least one output for the neural network [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)], each iteration using at least one output of a previous iteration and a current model of the at [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model using the output from the prior stages/iterations (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)], wherein an activation function for the iteratively training step is a function of the current model and a bias [the neural network is iteratively trained (i.e. each model updated based on a prior model) (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN) including writing back the model and updating and wherein the activation function includes the prior weights from memory and adding a bias (fig. 6, etc.)].
While Reagen teaches using a model specific to the hardware, and iteratively training a model with a bias (see above) it does not explicitly teach that the model is a continuously differentiable model, and wherein an activation function for the iteratively training step is a function of the current continuously differentiable model and a bias, wherein the bias is determined each iteration.
Esterline teaches providing at least one continuously differentiable model of the neural network [using a neural network wherein the activation function is continuous and differentiable (paras. 0009-10, etc.)], wherein an activation function for the iteratively training step is a function of the current continuously differentiable model and a bias, wherein the bias is determined each iteration [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (paras. 0010, 0077-79, etc.) where the bias may also have its own weight (para. 0070, etc.) and where the variables of the activation function including the bias may be iteratively calculated during the training (para. 0089, etc.)].
Reagen and Esterline are analogous art, as they are within the same field of endeavor, namely optimizing machine learning models.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize a continuously differentiable model for the neural network, as well as iterative updating of the weights and bias, as taught by Esterline, as the model of the neural network used in the system of Reagen.
Esterline provides motivation as [the activation function may advantageously be continuous and differentiable, such as a sigmoid function (para. 0010, etc.) so that with respect to a sigmoid-based activation function, known characteristics of such functions may be leveraged to intelligently select starting values that allow for potentially better and/or more efficient solutions to be obtained (para. 0079, etc.)].

As per claim 2, Reagen/Esterline teaches wherein a first continuously differentiable model of the at least one continuously differentiable model is a software [the optimization of the design includes software models for simulation (Reagen: sections 3, 8, etc.) where the activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010, etc.)].

As per claim 3, Reagen/Esterline teaches wherein each current continuously differentiable model of the neural network provides a closer approximation to the hardware of the neural network than a previous continuously differentiable model [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model using the output from the prior stages/iterations (Reagen: fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7 for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)].

As per claim 4, Reagen/Esterline teaches wherein the iteratively training further includes: performing back propagation using the at least one output to obtain at least one weight, the at least one weight being used for a next iteration [a neural network is trained by iteratively adjusting weights to minimize a loss function over labeled data (Reagen: appendix A, etc.) where the weights are trained using backpropagation (Esterline: paras. 0077-79, etc.)].

As per claim 6, Reagen/Esterline teaches wherein the continuously differentiable model is based on at least one weight and at least one input for the training step [each of the plurality of artificial neuron modules utilizes at least one of the weights and each of the input signals as variables of a non-linear activation function, wherein each of the plurality of artificial neuron modules provides an output that is based at least in part on a solution of the non-linear activation function. The activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010, etc.)].

As per claim 7, Reagen/Esterline teaches wherein an activation function for the iteratively training step is a function of the current continuously differentiable model, the at least one input, a realism parameter for the current continuously differentiable model and a bias [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (Esterline: paras. 0010, 0077-79, etc.)].

As per claim 8, Reagen/Esterline teaches wherein the iteratively training step further includes: determining the at least one bias is determined using the current continuously differentiable model [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (the bias is being learned iteratively so it is based on one model for the next model) (Esterline: paras. 0010, 0077-79, etc.)].

As per claim 9, Reagen/Esterline teaches wherein the iteratively training step further includes: determining the at least one bias is determined using at least one other continuously differentiable model different from the current continuously differentiable model [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (the bias is being learned iteratively so it is based on one model for the next model) (Esterline: paras. 0010, 0077-79, etc.)].

As per claim 10, Reagen/Esterline teaches wherein the neural network utilizes at least one discrete weight [the optimization includes determining a fixed-point bitwidth for the weights (making them discrete) of the neural network (Reagen: section 5.2, etc.)].

As per claim 12, Reagen/Esterline teaches wherein the scaled sigmoid is given by σ = 1/(1 + e-ω/ωSC) [a system may incorporate a combination of different types of activation functions including different types of sigmoids (Esterline: para. 0066 and TABLE A, which shows the sigmoid of the claimed equation)].

As per claim 13, Reagen/Esterline teaches wherein the iteratively training step further includes: validating the at least one weight using a final continuously differentiable model of the at least one continuously differentiable model after each iteration [a final neural network model is optimized and the optimizations validated (Reagen: section 3, etc.)].

As per claim 15, Reagen/Esterline teaches wherein the iteratively training step further includes applying the current continuously differentiable model to at least one activation and the at least one weight [each of the plurality of artificial neuron modules utilizes at least one of the weights and each of the input signals as variables of a non-linear activation function, wherein each of the plurality of artificial neuron modules provides an output that is based at least in part on a solution of the non-linear activation function. The activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010, etc.)].

As per claim 16, Reagen/Esterline teaches wherein the iteratively training step occurs off-chip for the neural network, the method further comprising: providing at least one final output to the neural network [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model (Reagen: fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling); so the training is performed on a different system than the one that will be running the NN].

As per claim 19, see the rejection of claim 1, above, wherein Reagen/Esterline also teaches a neural network training system implemented using at least one computing device including at least one processor and memory [the training is performed by at least one GPU and stored in memory (Reagen: sections 3, 9, etc.)].



Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reagen and Esterline as applied to claim 4 above, and further in view of Sather (US 10,592,732).

As per claim 5, Reagen/Esterline teaches the method of claim 4, as described above.
While Reagen/Esterline teaches iterative training (see above) it does not explicitly teach wherein the step of iteratively training further includes: terminating if the at least one weight is not more than at least one threshold different from at least one previous weight.
Sather teaches wherein the step of iteratively training further includes: terminating if the at least one weight is not more than at least one threshold different from at least one previous weight [a neural network may be trained in a repeated/iterative training process including stopping training once weights have changed by less than a threshold for a particular number of iterations (col. 11, line 57-col. 12, line 6, etc.)].
Reagen/Esterline and Sather are analogous art, as they are within the same field of endeavor, namely neural network training/optimization.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to stop the iterative neural network training process once weights have changed by less than a threshold, as taught by Sather, for determining the stopping point of the iterative training in the system taught by Reagen/Esterline.
Because both Reagen/Esterline and Sather teach iterative training of a neural network, and Reagen/Esterline does not explicitly describe what criteria is used to determine when training should be terminated, it would have been obvious to one of ordinary skill in the art to stop the iterative neural network training process once weights have changed by less than a threshold, as taught by Sather, for determining the stopping point of the iterative training in the system taught by Reagen/Esterline, to .


Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reagen and Esterline as applied to claim 13 above, and further in view of Martinez (US 2019/0114824).

As per claim 14, Reagen/Esterline teaches the method of claim 13, as described above.
While Reagen/Esterline teaches validating the model (see above) it does not explicitly teach wherein the validating step includes applying a process noise model.
Martinez teaches wherein the validating step includes applying a process noise model [training data may be divided into a training set and a validation set. In each of these two sets, data augmentation may be performed. Generally, augmentation may include algorithm treatment for noise in data, or missing data, as well as handling a variable number of training samples (para. 0063, etc.)].
Reagen/Esterline and Martinez are analogous art, as they are within the same field of endeavor, namely neural network optimization/training.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize a noise model in the validation of the neural 
Martinez provides motivation as [providing an algorithm for noise, missing date, and variable sizes improves the training and validation of the neural network (paras. 0063-65, etc.)].



Allowable Subject Matter
Claims 11, 17, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
While the cited art teaches various models for optimizing a neural network including modeling specific hardware (see above) none of the cited art appears to explicitly teach or suggest the specific models recited in these claims.
Claim 18 is allowable over the cited art for the same reason.


Response to Arguments
Applicant's arguments filed 15 December 2021 have been fully considered but they are not persuasive.

Applicant argues that Reagen and Esterline do not teach that an activation function is a function of a current continuously differentiable model and a bias, wherein the bias is determined each iteration.
However, Reagen teaches the neural network is iteratively trained (i.e. each model updated based on a prior model) (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN) including writing back the model and updating and wherein the activation function includes the prior weights from memory and adding a bias (fig. 6, etc.) and Esterline teaches calculating the neural network output signal may include providing the input signals and the weights to a plurality of artificial neuron modules of the neural network processing module, wherein at least one of the weights is provided to each of the plurality of artificial neuron modules. The process may further include providing a neuron output from each of the plurality of neuron modules to a linear summer module, wherein the neural network output signal is based at least in part on an output of the linear summer module. A bias input may be provided to the linear summer for shifting the output of the linear summer up or down. In certain embodiments, each of the plurality of artificial neuron modules utilizes at least one of the weights and each of the input signals as variables of a non-linear activation function, wherein each of the plurality of artificial neuron modules provides an output that is based at least in part on a solution of the non-linear activation function. The activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010), where the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal .


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claim 18 is allowed; claims 11, 17 and 20 are objected to; claims 1-10, 12-16, and 19 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sayyarrodsari (US 2017/0205813) and Su (US 11,043,205) – disclose systems/methods utilizing continuously differentiable models.
Gaborski (US 5,052,043) – discloses a sigmoid based activation function that includes weights and a bias that are each updated during each iteration, based on prior iteration value.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.