DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 29 June 2022, in response to the Office Action mailed 25 February 2022.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 25 April 2022 has been entered.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-10, 12, 13, 15, 16, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reagen et al. (Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators, June 2016, pgs. 267-278), in view of Esterline (US 2013/0041859), and further in view of Chung et al. (Deep Neural Network Using Trainable Activation Functions, July 2016, pgs. 348-352).

As per claim 1, Reagen teaches a method for training a neural network comprising: providing at least one model of the neural network [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware combined with the DNN model (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)], the at least one model being specific to hardware of the neural network [a detailed model of the hardware is used to simulate the hardware accelerator that is to run the NN (sections 3, 5, etc.)]; iteratively training the neural network using the at least one model to provide at least one output for the neural network [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)], each iteration using at least one output of a previous iteration and a current model of the at least one model [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model using the output from the prior stages/iterations (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling)], wherein an activation function for the iteratively training step is a function of the current model and a bias [the neural network is iteratively trained (i.e. each model updated based on a prior model) (fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN) including writing back the model and updating and wherein the activation function includes the prior weights from memory and adding a bias (fig. 6, etc.)].
While Reagen teaches using a model specific to the hardware, and iteratively training a model with a bias (see above) it does not explicitly teach that the model is a continuously differentiable model, and wherein an activation function for the iteratively training step is a function of the current continuously differentiable model and a bias, wherein the bias is determined each iteration and the current continuously differentiable model used in the activation function is different for each iteration.
Esterline teaches providing at least one continuously differentiable model of the neural network [using a neural network wherein the activation function is continuous and differentiable (paras. 0009-10, etc.)], wherein an activation function for the iteratively training step is a function of the current continuously differentiable model and a bias, wherein the bias is determined each iteration [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (paras. 0010, 0077-79, etc.) where the bias may also have its own weight (para. 0070, etc.) and where the variables of the activation function including the bias may be iteratively calculated during the training (para. 0089, etc.)].
Reagen and Esterline are analogous art, as they are within the same field of endeavor, namely optimizing machine learning models.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize a continuously differentiable model for the neural network, as well as iterative updating of the weights and bias, as taught by Esterline, as the model of the neural network used in the system of Reagen.
Esterline provides motivation as [the activation function may advantageously be continuous and differentiable, such as a sigmoid function (para. 0010, etc.) so that with respect to a sigmoid-based activation function, known characteristics of such functions may be leveraged to intelligently select starting values that allow for potentially better and/or more efficient solutions to be obtained (para. 0079, etc.)].
Chung teaches providing at least one continuously differentiable model of the neural network, wherein an activation function for the iteratively training step is a function of the current continuously differentiable model [the activation functions are approximated in an infinitely differentiable Taylor series of the function (pg. 349, section III.A)] and the current continuously differentiable model used in the activation function is different for each iteration [the activation functions of the neural network are trained, being updated during each training epoch, using a series of approximations for the activation functions (pg. 348, abstract; pgs. 349-350, sections III.A-B; pg. 352, section V; etc.)].
Reagen/Esterline and Chung are analogous art, as they are within the same field of endeavor, namely optimizing machine learning models.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the continuously differentiable function approximations for training activation functions, as taught by Chung, in training of the model including continuously differentiable activation functions in the system taught by Reagen/Esterline.
Chung provides motivation as [the best activation function for any specific task domain is difficult to determine, so allowing training of the activation function along with other parameters can provide a better fit (pg. 348, abstract and section I; etc.)].

As per claim 2, Reagen/Esterline/Chung teaches wherein a first continuously differentiable model of the at least one continuously differentiable model is a software model [the optimization of the design includes software models for simulation (Reagen: sections 3, 8, etc.) where the activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010, etc.) and the activation functions are approximated in an infinitely differentiable Taylor series of the function (Chung: pg. 349, section III.A)].

As per claim 3, Reagen/Esterline/Chung teaches wherein each current continuously differentiable model of the neural network provides a closer approximation to the hardware of the neural network than a previous continuously differentiable model [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model using the output from the prior stages/iterations (Reagen: fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7 for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling) and where the activation functions are updated during training to provide a better approximation for the specific task (Chung: pg. 348, abstract-section I; etc.)].

As per claim 4, Reagen/Esterline/Chung teaches wherein the iteratively training further includes: performing back propagation using the at least one output to obtain at least one weight, the at least one weight being used for a next iteration [a neural network is trained by iteratively adjusting weights to minimize a loss function over labeled data (Reagen: appendix A, etc.) where the weights are trained using backpropagation (Esterline: paras. 0077-79; Chung: pg. 348, abstract; etc.)].

As per claim 6, Reagen/Esterline/Chung teaches wherein the continuously differentiable model is based on at least one weight and at least one input for the training step [each of the plurality of artificial neuron modules utilizes at least one of the weights and each of the input signals as variables of a non-linear activation function, wherein each of the plurality of artificial neuron modules provides an output that is based at least in part on a solution of the non-linear activation function. The activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010; etc.)].

As per claim 7, Reagen/Esterline/Chung teaches wherein an activation function for the iteratively training step is a function of the current continuously differentiable model, the at least one input, a realism parameter for the current continuously differentiable model and a bias [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (Esterline: paras. 0010, 0077-79, etc.)].

As per claim 8, Reagen/Esterline/Chung teaches wherein the iteratively training step further includes: determining the at least one bias is determined using the current continuously differentiable model [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (the bias is being learned iteratively so it is based on one model for the next model) (Esterline: paras. 0010, 0077-79, etc.)].

As per claim 9, Reagen/Esterline/Chung teaches wherein the iteratively training step further includes: determining the at least one bias is determined using at least one other continuously differentiable model different from the current continuously differentiable model [the weights may be trained using backpropagation, a supervised learning method, and a signal from a reference system (realism parameter) is fed back into the training system, and an error signal representing the difference between the reference signal and the output signal is analyzed in view of one or more inputs; including solving for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency over temperature as closely as possible (the bias is being learned iteratively so it is based on one model for the next model) (Esterline: paras. 0010, 0077-79, etc.)].

As per claim 10, Reagen/Esterline/Chung teaches wherein the neural network utilizes at least one discrete weight [the optimization includes determining a fixed-point bitwidth for the weights (making them discrete) of the neural network (Reagen: section 5.2, etc.)].

As per claim 12, Reagen/Esterline/Chung e teaches wherein the scaled sigmoid is given by σ = 1/(1 + e-ω/ωSC) [a system may incorporate a combination of different types of activation functions including different types of sigmoids (Esterline: para. 0066 and TABLE A, which shows the sigmoid of the claimed equation)].

As per claim 13, Reagen/Esterline/Chung teaches wherein the iteratively training step further includes: validating the at least one weight using a final continuously differentiable model of the at least one continuously differentiable model after each iteration [a final neural network model is optimized and the optimizations validated (Reagen: section 3, etc.)].

As per claim 15, Reagen/Esterline/Chung teaches wherein the iteratively training step further includes applying the current continuously differentiable model to at least one activation and the at least one weight [each of the plurality of artificial neuron modules utilizes at least one of the weights and each of the input signals as variables of a non-linear activation function, wherein each of the plurality of artificial neuron modules provides an output that is based at least in part on a solution of the non-linear activation function. The activation function may advantageously be continuous and differentiable, such as a sigmoid function (Esterline: para. 0010, etc.)].

As per claim 16, Reagen/Esterline/Chung teaches wherein the iteratively training step occurs off-chip for the neural network, the method further comprising: providing at least one final output to the neural network [Minerva includes a 5 stage method of optimizing neural networks and hardware for running a neural network, including initial model training and several stages of optimization using a PPA model of the hardware for simulation to run and optimize the DNN design and the hardware model (Reagen: fig. 2 and section 2; see also sections 3.1, 3.4, 4, and 6-7  for training the NN; and sections 3.3, 5, 8.4, 9, etc. regarding the hardware modelling); so the training is performed on a different system than the one that will be running the NN].

As per claim 19, see the rejection of claim 1, above, wherein Reagen/Esterline/Chung also teaches a neural network training system implemented using at least one computing device including at least one processor and memory [the training is performed by at least one GPU and stored in memory (Reagen: sections 3, 9, etc.)].


Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reagen, Esterline, and Chung as applied to claim 4 above, and further in view of Sather (US 10,592,732).

As per claim 5, Reagen/Esterline/Chung teaches the method of claim 4, as described above.
While Reagen/Esterline/Chung teaches iterative training (see above) it does not explicitly teach wherein the step of iteratively training further includes: terminating if the at least one weight is not more than at least one threshold different from at least one previous weight.
Sather teaches wherein the step of iteratively training further includes: terminating if the at least one weight is not more than at least one threshold different from at least one previous weight [a neural network may be trained in a repeated/iterative training process including stopping training once weights have changed by less than a threshold for a particular number of iterations (col. 11, line 57-col. 12, line 6, etc.)].
Reagen/Esterline and Sather are analogous art, as they are within the same field of endeavor, namely neural network training/optimization.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to stop the iterative neural network training process once weights have changed by less than a threshold, as taught by Sather, for determining the stopping point of the iterative training in the system taught by Reagen/Esterline.
Because both Reagen/Esterline and Sather teach iterative training of a neural network, and Reagen/Esterline does not explicitly describe what criteria is used to determine when training should be terminated, it would have been obvious to one of ordinary skill in the art to stop the iterative neural network training process once weights have changed by less than a threshold, as taught by Sather, for determining the stopping point of the iterative training in the system taught by Reagen/Esterline, to achieve the predictable result of knowing when to finish training, avoiding overfitting, etc.


Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reagen, Esterline, and Chung as applied to claim 13 above, and further in view of Martinez (US 2019/0114824).

As per claim 14, Reagen/Esterline/Chung teaches the method of claim 13, as described above.
While Reagen/Esterline/Chung teaches validating the model (see above) it does not explicitly teach wherein the validating step includes applying a process noise model.
Martinez teaches wherein the validating step includes applying a process noise model [training data may be divided into a training set and a validation set. In each of these two sets, data augmentation may be performed. Generally, augmentation may include algorithm treatment for noise in data, or missing data, as well as handling a variable number of training samples (para. 0063, etc.)].
Reagen/Esterline and Martinez are analogous art, as they are within the same field of endeavor, namely neural network optimization/training.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize a noise model in the validation of the neural network, as taught by Martinez, as part of the neural network validation in the system taught by Reagen/Esterline.
Martinez provides motivation as [providing an algorithm for noise, missing date, and variable sizes improves the training and validation of the neural network (paras. 0063-65, etc.)].


Allowable Subject Matter
Claims 11, 17, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
While the cited art teaches various models for optimizing a neural network including modeling specific hardware (see above) none of the cited art appears to explicitly teach or suggest the specific models recited in these claims.
Claim 18 is allowable over the cited art for the same reason.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-10, 12, 13, 15, 16, and 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claim 18 is allowed; claims 11, 17 and 20 are objected to; claims 1-10, 12-16, and 19 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sayyarrodsari (US 2017/0205813) and Su (US 11,043,205) – disclose systems/methods utilizing continuously differentiable models.
Gaborski (US 5,052,043) – discloses a sigmoid based activation function that includes weights and a bias that are each updated during each iteration, based on prior iteration value.
LaBorde (US 10,777,306) – discloses a system using sigmoid activation functions, including adding new layers with new activation functions during training epochs.
Andoni (US 9,785,886 and US 2018/0314938) – disclose systems using a GA to produce new models for each of a number of epochs, including sigmoid activation functions.
Kamyshanksa et al. (The Potential Energy of an Autoencoder, Oct 2014, pgs. 1261-1273) – discloses an autoencoder with different types of activation functions, including continuously differentiable activation functions, for different nodes.
Njikam et al. (A novel activation for multilayer feed-forward neural networks, Jan 2016, pgs. 75-82) – discloses different types of continuously differentiable activation functions.
Grelsson et al. (Improved Learning in Convolutional Neural Networks with Shifted Exponential Linear Units (ShELUs), Aug 2018, pgs. 517-522) – discloses different types of continuously differentiable activation functions.
Özbay et al. (A new method for classification of ECG arrhythmias using neural network with adaptive activation function, July 2010, pgs. 1040-1049) – discloses adjusting parameters of an activation function during training.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE GIROUX/Primary Examiner, Art Unit 2128