Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the amendments and remarks filed on 03/31/2022.
Claims 1-24 are pending.
Claims 1, 9, 14-15, 21, and 23 have been amended.

Response to Arguments
Applicant’s arguments, with respect to the claim objections, have been fully considered and are persuasive. Therefore, the objections set forth in the previous office action have been withdrawn. 

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1-24 under 35 U.S.C. 101, have been considered but they are not persuasive. The applicant argues that the independent claims do “not recite a judicial exception[s]” and “even assuming a judicial exception was recited…a practical application of the judicial exception” was recited. Further, for claim 21 a “human being, even with pen and paper, cannot reasonably perform a training process that would make use of the subject matter disclosed…[such as] perform a training process, with the training epochs of approximately one hundred milliseconds”; and “[t]here is a real-world benefit…in determining that the training process is completed within a maximum number of preselected epochs”. Therefore, the claims overcome the 101 rejection. The examiner respectfully disagrees. 
The recitations of the generic computer components (“a memory”, “at least on logic circuit”/“a processor”, and “computer-readable storage medium”) are recited at a high level and do not integrate the judicial exceptions into a practical application, and their operations are able to be performed in a human mind and/or with the aid of pen and paper; thus, the additional elements remain recited at a high level generality and amount to mere data storing and data outputting, which are forms on insignificant extra-solution activities, or merely uses a computer as a tool to perform an abstract idea. Further, the argued determining training epochs in a specific amount of time is not claimed, and the process for calculating this as defined by the claim is maintained as being able to be performed in a human mind and/or with the aid of pen and paper, does not impart operations of the additional elements that are sufficient to amount to significantly more than the judicial exception, and not impose any meaningful limits on practicing the abstract idea. See 35 U.S.C 101 section for full, updated analysis of claim limitations necessitated by applicant amendments.

Applicant’s arguments, with respect to the rejection(s) of analogous claim(s) 1, 9 and 15 under 35 U.S.C. 103, have been considered but they are not persuasive. Specifically, the applicant argues that no prior art of reference teaches the amended claims 1, 9 and 15 limitations, since no reference makes a distinction between “the five tuning parameters” and “the weighting parameters” as claimed; and “Li’s word error ratio…is not one of the five tuning parameters set forth in [amended] claim 1”. The examiner respectfully disagrees with all presented arguments. 
In view of applicant amendments, and upon review of the references, Black has been found to at least imply “at least five tuning parameters” distinct from the “the weighting parameters” as argued. Due to the broadness of the claim language, Black has been found to meet all requirements set forth by the claim language. Black, Col. 1, lines 26-49, Col. 3, lines 51-60, Col. 4, lines 21-49, Col. 5, lines 2-6, Col. 12, lines 14-26, and Col. 19, lines 4-20 teach in order to train “an ANN”, the “size of the training dataset”, associated “momentum”, an initial “adaptive learning rate”, weight distribution “epsilon”, and the amount of “nodes and layers” are first determined for initialization (at least five tuning parameters). Further, Col. 4, line 64-Col. 5, line 44, Col. 11, lines 37-47, and Col. 12, lines 4-13 teach once the neural network is initialized (at least five tuning parameters), the network is then set to process “[a]n input training pattern…for a preset number of iterations” for adjusting the “[t]he values of the weights (weighting parameters) used to initialize the ANN…based on the calculated error and the adaptive learning rate” (weighting parameters separate from the at least five tuning parameters).
Further, Li paragraphs 0071, 0095, and 0140-0159 teach before training a NN (prior to an initial training epoch), determining training iteration criteria and stopping (a number of epochs/iterations), including “max_iters (which refers to the maximum number of training iterations) (first tuning parameter), min_iters (which refers to the minimum number of training iterations) (second tuning parameter), keep_lr_iters (which refers to the number of iterations that keep the initial learning rate) (third tuning parameter)…end_halving_impr (which is used to determine when to terminate the training, for example, 0.001) (fourth tuning parameter)”, “start_halving_impr” (fifth tuning parameter), “halving_factor” (alternative fifth tuning parameter), and if a word error ratio (WER) is “below a certain threshold” that is predetermined (alternative fifth tuning parameter). Paragraphs 0071, 0095, and 0140-0159 teach changing the “learning rate” (determining a learning rate) based on the amount of training iterations and the determined “fine-tuning” parameters as mapped above (the at least five tuning parameters), the “gradient descent algorithm” results (gradient descent value), and WER results (training error threshold to determine whether to terminate training). Further, the gradient descent is changed due to the detecting significant parameter changes and the WER is calculated from training with the parameters (the training error threshold separate from the at least five tuning parameter as argued).
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claims 1, 9, 15, and 21 are respectively drawn to a system, method, and a machine readable medium, hence each falls under one of four categories of statutory subject matter (Step 1).  Nonetheless, the claims are directed to a judicially recognized exception of an abstract idea without significantly more.  
Claims 1, 9, 15, and 21 recite the following, or analogous, limitations “select, prior to an initial training epoch, at least five tuning parameters to be used when training a machine learning model, the selection of the at least five tuning parameters based on a selected number of epochs; the selection of the at least five tuning parameters to cause the training of the machine learning model to be completed within a number of training epochs less than or equal to the selected number of epochs;…determining an amount of training error associated with a prior training epoch of the machine learning model, determining a gradient descent value based on the amount of training error; and determining a learning rate based on (1) the gradient descent value, and (2) the at least five tuning parameters, and (3) a training error threshold to determine whether to terminate training, the training error threshold separate from the at least five tuning parameters; update weighting parameters of the machine learning model based on the learning rate, the weighting parameters separate from the at least five tuning parameters” [claims 1, 9, and 15], and calculating…a learning rate that is determined as a first product of a step size times a first tuning parameter times a sum of (1) a second tuning parameter times a second product of the amount of training error to a power of a third tuning parameter and (2) a fourth tuning parameter times a third product of the amount of training error to a power of a fifth tuning parameter, wherein the sum is to the power of a sixth tuning parameter, wherein the sum is divided by the gradient descent value, wherein the learning rate is based on (1) the gradient descent value, (2) the amount of training error, and (3) tuning parameters, the tuning parameters selected such that a training process is completed within a maximum number of preselected epochs;” [claim 21]. These limitations, as claimed, under its broadest reasonable interpretation, can be evaluated in a human mind or with pen and paper and/or are drawn to mathematical concepts [claim 21], except for the recitation of generic computer components (Step 2A). Other than reciting “a memory”, “at least on logic circuit”/“a processor”, and “computer-readable storage medium” [claims 1, 9, and 21] to perform the exceptions, nothing in the claims preclude the steps from practically being performed in the human mind or with pen and paper. For example, a human expert can select, prior to an initial training epoch, at least five tuning parameters to be used when training a machine learning model, the selection of the at least five tuning parameters based on a selected number of epochs (e.g. by thinking of five parameters that are influenced a number of epochs/iterations for future tuning of an algorithm), the selection of the at least five tuning parameters to cause the training of the machine learning model to be completed within a number of training epochs less than or equal to the selected number of epochs less than or equal to the selected number of epochs (e.g. by thinking of five parameters that are influenced a number of epochs/iterations for future tuning of an algorithm to be finished within the epochs/iterations), mentally determine an amount of training error associated with a prior training epoch of the machine learning model (e.g. by mentally calculating a value of inaccuracy for a mentally calculated result), mentally determining a gradient descent value based on the amount of training error (e.g. by thinking of a minimum error ratio for minimizing the mentally calculated inaccuracy), mentally determining a learning rate based on (1) the gradient descent value, (2) the at least five tuning parameters, and (3) a training error threshold to determine whether to terminate training, the training error threshold separate from the at least five tuning parameters (e.g. by mentally or with pen and paper calculating a step size to minimize error from the determined error, error ratio, error limit, and parameters), mentally calculating a learning rate is determined as a first product of a step size times a first tuning parameter times a sum of (1) a second tuning parameter times a second product of the amount of training error to a power of a third tuning parameter and (2) a fourth tuning parameter times a third product of the amount of training error to a power of a fifth tuning parameter (e.g. by mentally, or with pen and paper, calculating out the mathematical concepts to find results), mentally determine wherein the sum is to the power of a sixth tuning parameter, wherein the sum is divided by the gradient descent value (e.g. by mentally, or with pen and paper, calculating out the mathematical concepts to find results), and mentally update weighting parameters of the machine learning model based on the learning rate, the weighting parameters separate from the at least five tuning parameters (e.g. by mentally, or with pen and paper, changing the weights of a neural network/model/algorithm and not the selected parameters based on the calculated results and determined step size). Further, claims 1, 9, 15, and 21 state “training” a “machine learning model”/“neural network” using the above steps, however the “training” is also deemed a mental process since it is defined by the above mental process steps. Thus, claims 1, 9, 15, and 21 recite a mental process (Step 2A, Prong 1). 
Claims 1, 9, and 21 include additional elements, “a memory”, “at least on logic circuit”/“a processor”, and “computer-readable storage medium”, however the recitations of these elements are at a high level of generality and amount to mere data storing and data outputting, which are forms on insignificant extra-solution activities. Hence, each of the additional limitations or in combination is no more than mere instructions to apply the exceptions using generic computer components (i.e., “a memory”, “at least on logic circuit”/“a processor”, and “computer-readable storage medium”) and do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea (Step 2A, Prong 2; see MPEP 2106.05(f)). The additional elements in the claim do not amount to significantly more than an abstract idea. Furthermore, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using ““a memory”, “at least on logic circuit”/“a processor”, and “computer-readable storage medium” [claims 1, 9, and 21] to perform the claimed steps amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. (STEP 2B). As such, claims 1, 9, 15, and 21 are not patent eligible.
Dependent claims 2-8, 10-14, 16-20, and 22-24 are also ineligible for the same reasons given with respect to claims 1, 9, 15, and 21.  The dependent claims describe additional mental processes:
mentally, or with pen and paper, determine the at least five tuning parameters such that a training process is completed within a maximum number of epochs (claims 2 and 10) (e.g. by thinking of five parameters that influence a max number of epochs/iterations)
mentally, or with pen and paper,  determining a tuning parameter memory to store the at least five tuning parameters (claim 3) (e.g. by remembering the determined parameters)
mentally, or with pen and paper,  determining a number of epochs that have elapsed during the iterative training, and in response to determining that the number of epochs that have elapsed meets or exceeds a maximum number of epochs, terminate the iterative training (claims 4, 11, and 16) (e.g. by thinking, or with pen and paper, of how many times the model has calculated results on the data compared to a determined maximum times, and stopping if it has exceeded)
mentally, or with pen and paper, determine an amount of training error using the updated weighting parameters, and in response to a determination that the amount of training error is less than a training error threshold, terminate the iterative training (claims 5, 12, and 17) (e.g. by calculating a value of inaccuracy for a mentally calculated result)
mentally, or with pen and paper, determine the learning rate is a first learning rate, and determine a second learning rate corresponding to a subsequent epoch, the second learning rate different from the first learning rate (claims 6, 13, and 18) (e.g. by calculating a step size to minimize error for a round of model training on the data, and for a next round of model training on the data and confirming they are of different values)
mentally, or with pen and paper, determining whether the learning rate is greater than a learning rate threshold, and, in response to determining that the learning rate is greater than the learning rate threshold, set the learning rate to the learning rate threshold (claim 7, 14, and 19) (e.g. by determining the calculated step size to minimize error is greater that a predetermined threshold and replacing the threshold with the calculated step size)
mentally, or with pen and paper, determining an input to generate an output based on the weighting parameters (claim 8) (e.g. by thinking of a value to input into the algorithm)
mentally, or with pen and paper, determining the learning rate is determined as a first tuning parameter times a sum of (1) a second tuning parameter times a second product of the amount of training error to a power of a third tuning parameter and (2) a fourth tuning parameter times a third product of the amount of training error to a power of a fifth tuning parameter, wherein the sum is to a power of a sixth tuning parameter, wherein the sum is divided by the gradient descent value (claim 20) (e.g. by mentally, or with pen and paper, calculating out the mathematical concepts to find results)
mentally, or with pen and paper, identifying wherein a first product of a first tuning parameter and a second tuning parameter is less than one (claim 22) (e.g. by mentally, or with pen and paper, calculating out the mathematical concept of the determined parameters to find results, and identifying the result is less than one)
mentally, or with pen and paper, identifying wherein a second product of the first tuning parameter and a third tuning parameter is greater than one, the first tuning parameter and the third tuning parameter used to determine the maximum number of iterations (claim 23) (e.g. by mentally, or with pen and paper, calculating out the mathematical concept of the determined parameters to find results, the parameters used to limit specific parameters including the max iterations, and identifying the result is greater than one)
mentally, or with pen and paper, identifying wherein the at least five tuning parameters are positive (claim 24) (e.g. by identifying of the determined parameters are all positive numbers)
Again, the dependent claims continued to cover the performance of the limitation in the mind as inherited from the independent claims (Step 2A, Prong 1). The dependent claims 2, 4-5, and 7-8, restating “the at least one logic circuit” of claim 1, and the dependent claims 10-14, restating the “computer-readable storage medium” and “a processor” of claim 9 to perform the steps of the dependent claims are again no more than a generic computer component to apply the exception and do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea (Step 2A, Prong 2; see MPEP 2106.05(h)). The additional element in the claims do not amount to significantly more than an abstract idea. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements to perform the steps of in the dependent claims amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. (STEP 2B). As such, dependent claims 2-8, 10-14, 16-20, and 22-24 do not amount to significantly more than an abstract idea nor provide any inventive concept, therefore are not patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-3, 5-6, 8, 9-10, 12-13, 15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Black (US Patent 6269351), in view of Li et al (US Pub 20180046919) hereinafter Li, in view of Goel et al (US Pub 20160307098) hereinafter Goel.
Regarding claims 1, 9, and 15, Black teaches an apparatus, a non-transitory computer-readable storage medium comprising instructions which, when executed, cause a processor to, perform a method for training machine learning models [a neural network(s) as claimed in claims 9 and 15], the method comprising: memory; at least one logic circuit to implement (Col. 4, lines 10-20 and Col. 23, lines 26-33 teach an “artificial neural network training method and system” for training “ANNs” (ML models/NNs) understood to be a computer system including one or more processors (at least one logic circuit) and memories (memory) to perform the embodiments of the disclosure):
select, prior to an initial training epoch [iteration as claimed in claims 9 and 15], at least five tuning parameters to be used when training a machine learning model, the selection of the at least five tuning parameters based on a selected number of epochs [iterations as claimed in claims 9 and 15] (Col. 1, lines 26-49 teach “Prior to training (prior to an initial training epoch/iteration), an ANN is initialized by randomly assigning values to free parameters (select…at least five tuning parameters) known as weights”, wherein “a weight associated with…that particular node” and Fig. 10A-10C depicts the ANN having more than five nodes (at least five tuning parameters). Alternatively, Col. 1, lines 26-49, Col. 3, lines 51-60, Col. 4, lines 21-49, Col. 5, lines 2-6, Col. 12, lines 14-26, and Col. 19, lines 4-20 teach in order to train “an ANN” (prior to an initial training epoch), the “size of the training dataset”, associated “momentum”, an initial “adaptive learning rate”, weight distribution “epsilon”, and the amount of “nodes and layers” are first determined for initialization (select…at least five tuning parameters to be used when training a machine learning model). Further, Col. 5, lines 26-44 teach once the neural network is initialized, the network is then set to process “[a]n input training pattern…for a preset number of iterations” (selection…based on a selected number of epochs/iterations).);
the selection of the at least five tuning parameters to cause the training of the machine learning model to be completed within a number of training epochs [iterations as claimed in claims 9 and 15] less than or equal to the selected number of epochs [iterations as claimed in claim 9] (Col. 4, line 64-Col. 5, line 44, Col. 11, lines 37-47, and Col. 12, lines 4-13 teach once the neural network is initialized as mapped above [Col. 1, lines 26-49, Col. 3, lines 51-60, Col. 4, lines 21-49, Col. 5, lines 2-6, Col. 12, lines 14-26, and Col. 19, lines 4-20] (selection of the at least five tuning parameters to cause), during the training of an ANN (the training of the machine learning model) “[t]he value of the weights used to initialize the ANN (machine learning model) are adjusted based on the calculated error and the adaptive learning rate” and “fine-tuning parameters” within a “preset number of iterations (within a number of training epochs less than or equal to the selected number of epochs)”. This is understood that the system is able to track (count) the number of iterations from a plurality of iterations (iterative training) since it can train a neural network within “a preset number of iterations” (within a number of training epochs less than or equal to the selected number of epochs) and maintain training speed.); and 
train the machine learning model [neural network as claimed in claim 15] by iteratively:
determining an amount of training error associated with a prior training epoch [iteration as claimed in claims 9 and 15] of the machine learning model (Col. 4, line 64-Col. 5, line 25 teach calculating an error (determining an amount of training error) for each training iteration of an ANN (train a machine learning model [neural network as claimed in claim 15] by iteratively) including the “preceding” iteration (associated with a prior training epoch/iteration)), 
determining a gradient descent value based on the amount of training error (Col. 4, line 64-Col. 5, line 25 teach calculating (determining) an error ratio (gradient descent value) from (based on) the current iteration error and the “preceding” iteration error (amount of training error). This is shown in Col. 8, lines 54-63, Col. 9, line 46-Col. 10, line 49 and Figs. 4-6 to be used in “gradient descent” techniques as the system monitors an error surface to compare the error ratio (gradient descent value) to a threshold for adjusting a learning rate while navigating through a weight space); 
[for claims 1 and 9] determining a learning rate based on (1) the gradient descent value, (2) the at least five tuning parameters, and (3) a training error threshold to determine whether to terminate training, the training error threshold separate from the at least five tuning parameters (Col. 4, line 64-Col. 6, line 16, Col. 12, lines 4-13, and Col. 15, lines 25-48 teach an “adaptive learning rate is calculated” (determining a learning rate), for training an initialized ANN as mapped above [Col. 1, lines 26-49, Col. 3, lines 51-60, Col. 4, lines 21-49, Col. 5, lines 2-6, Col. 12, lines 14-26, and Col. 19, lines 4-20] (based on…the at least five tuning parameters), from (based on) the error ratio (gradient descent value), running a training pattern a “preset number of iterations” (the at least five tuning parameters as mapped above), and an “error goal (training error threshold, the training error threshold separate from the at least five tuning parameters)”; wherein prediction error is compared to an “error goal (training error threshold)” and if the iteration error is “less than or equal to the final error goal (training error threshold)” the training is completed (to determine whether to terminate training)); and
update weighting parameters of the machine learning model based on the learning rate (Col. 4, line 64-Col. 5, line 44 and Col. 12, lines 4-13 teach an “adaptive learning rate is calculated” (determining a learning rate), for training an initialized ANN as mapped above (based on…the at least five tuning parameters) [Col. 1, lines 26-49], from (based on) the error ratio (gradient descent value) and running a training pattern a “preset number of iterations” (the at least five tuning parameters as mapped above); wherein “[t]he value of the weights (weighting parameters) used to initialize the ANN (machine learning model) are adjusted (to update) based on the calculated error and the adaptive learning rate”), the weighting parameters separate from the at least five tuning parameters (Col. 1, lines 26-49, Col. 3, lines 51-60, Col. 4, lines 21-49, Col. 5, lines 2-6, Col. 12, lines 14-26, and Col. 19, lines 4-20 teach in order to train “an ANN”, the “size of the training dataset”, associated “momentum”, an initial “adaptive learning rate”, weight distribution “epsilon”, and the amount of “nodes and layers” are first determined for initialization (at least five tuning parameters). Further, Col. 4, line 64-Col. 5, line 44, Col. 11, lines 37-47, and Col. 12, lines 4-13 teach once the neural network is initialized (at least five tuning parameters), the network is then set to process “[a]n input training pattern…for a preset number of iterations” for adjusting the “[t]he values of the weights (weighting parameters) used to initialize the ANN…based on the calculated error and the adaptive learning rate” (weighting parameters separate from the at least five tuning parameters).).
[for claim 15] determining, by executing an instruction with a processor, a learning rate based on (1) the gradient descent value, and (2) the amount of training error, and (3) the at least five tuning parameters; and updating weighting parameters of the neural network based on the learning rate (Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (by executing an instruction with a processor) and memories to perform the embodiments of the disclosure, such as in abstract and Col. 4, line 64-Col. 5, line 44 and Col. 12, lines 4-13 for an “adaptive learning rate is calculated” (determining…a learning rate), for training an initialized ANN as mapped above (based on…the at least five tuning parameters) [Col. 1, lines 26-49], from (based on) the error ratio (gradient descent value) determined from calculated “error” (the amount of training error) from running a training pattern a “preset number of iterations” (the at least five tuning parameters as mapped above); wherein “[t]he value of the weights (weighting parameters) used to initialize the ANN (neural network) are adjusted (to update) based on the calculated error and the adaptive learning rate”).

Black at least implies select, prior to an initial training epoch [iteration as claimed in claims 9 and 15], at least five tuning parameters, however Li teaches select, prior to an initial training epoch [iteration as claimed in claims 9 and 15], at least five tuning parameters (paragraphs 0071, 0095, and 0140-0159 teach before training a NN (prior to an initial training epoch), determining training iteration criteria and stopping (a number of epochs/iterations), including “max_iters (which refers to the maximum number of training iterations) (first tuning parameter), min_iters (which refers to the minimum number of training iterations) (second tuning parameter), keep_lr_iters (which refers to the number of iterations that keep the initial learning rate) (third tuning parameter)…end_halving_impr (which is used to determine when to terminate the training, for example, 0.001) (fourth tuning parameter)”, “start_halving_impr” (fifth tuning parameter), “halving_factor” (alternative fifth tuning parameter), and if a word error ratio (WER) is “below a certain threshold” that is predetermined (alternative fifth tuning parameter)).
Further, Black at least implies determining a learning rate based on (1) the gradient descent value, (2) the at least five tuning parameters, and (3) a training error threshold to determine whether to terminate training, the training error threshold separate from the at least five tuning parameters ; and update weighting parameters of the neural network based on the learning rate (see mappings above), however Li teaches determining a learning rate based on (1) the gradient descent value, (2) the at least five tuning parameters, and (3) a training error threshold to determine whether to terminate training, the training error threshold separate from the at least five tuning parameters (paragraphs 0071, 0095, and 0140-0159 teach changing the “learning rate” (determining a learning rate) based on the amount of training iterations and the determined “fine-tuning” parameters as mapped above (the at least five tuning parameters), the “gradient descent algorithm” results (gradient descent value), and WER results (training error threshold to determine whether to terminate training). Further, the gradient descent is changed due to the detecting significant parameter changes and the WER is calculated from training with the parameters (the training error threshold separate from the at least five tuning parameter).).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Li’s teachings of determining fine-tuning parameters before training a neural network and learning rate calculations into Black’s teaching of training an ANN through gradient descent and adaptive learning rate techniques in order to improve neural network training through specific “convergence criteria” (Li, paragraphs 0071, 0095, and 0140-0159).
Further still, Black at least implies an apparatus comprising: memory and at least one logic circuit; a non-transitory computer-readable storage medium comprising instructions which, when executed, cause a processor to, perform a method for training a neural network (see mapping above), however Goel teaches an apparatus, a non-transitory computer-readable storage medium comprising instructions which, when executed, cause a processor to, perform a method for training a neural network (paragraphs 0030-0036 teach a CRM storing instructions that when executed by the processor (at least one logic circuit), as taught in paragraphs 0083-0088, stop iterating a DNN training process when reaching the set max iterations for “early stopping” as to not “overfit” the model and also adjusting the learning rate.). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, to include a CRM and processor implementing early stopping by monitoring iteration count and learning rate adjusting as taught by Goel in order to optimize neural network training within a computing environment while avoiding overfitting (Goel, paragraphs 0030-0036 and 0083-0088).

Regarding claims 2 and 10, the combination of Black, Li, and Goel teach all the claim limitations of claims 1 and 9 above; and further teach the at least one logic circuit is to determine the at least five tuning parameters such that a training process is completed within a maximum number of epochs [iterations as claimed in claim 10] (Black, Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (at least one logic circuit) and memories to perform the embodiments of the disclosure, such as in Col. 1, lines 26-49, Col. 4, line 64-Col. 5, line 44, Col. 11, lines 37-47, and Col. 12, lines 4-13, and Fig. 10A-10C where “[t]he value of the weights (five tuning parameters as mapped above in claim 1) used to initialize the ANN (neural network) are adjusted (determined) based on the calculated error and the adaptive learning rate” and “fine-tuning parameters” in a “preset number of iterations (maximum/preselected number of epochs/iterations)”.).
Black at least implies the at least one logic circuit is to determine the at least five tuning parameters such that a training process is completed within a maximum number of epochs [iterations as claimed in claim 10] (see mapping above), however Li teaches …determine the at least five tuning parameters such that a training process is completed within a maximum number of epochs [iterations as claimed in claim 10] (paragraphs 0071, 0095, and 0140-0159 teach before training a NN, determining training iteration and stopping criteria, including the parameters as mapped above in claim 1 (determine the at least five tuning parameters) such as “max_iters (which refers to the maximum number of training iterations)”; wherein “the total number of iterations shall not be more than max_iters” (such that a training process is completed within a maximum number of epochs/iterations)).
Black, Li and Goel are combinable for the same rationale as set forth above with respect to claims 1, 9, and 15.
Further, Goel teaches at least one logic circuit (paragraphs 0030-0036 teach a CRM storing instructions that when executed by the processor (at least one logic circuit), as taught in paragraphs 0083-0088, stop iterating a DNN training process (neural network training process) while determining hyperparameters (tuning parameters) when reaching the set max iterations (within a maximum number of epochs) for “early stopping” as to not “overfit” the model.). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, to include a CRM and processor implementing early stopping by monitoring iteration count and learning rate adjusting as taught by Goel in order to optimize neural network training within a computing environment while avoiding overfitting (Goel, paragraphs 0030-0036 and 0083-0088).

Regarding claim 3, the combination of Black, Li, and Goel teach all the claim limitations of claim 2 above; and further teach a tuning parameter memory to store the at least five tuning parameters (Goel, paragraphs 0005, 0030-0036, 0053-0055, 0060, and 0083-0088 teach storing the annealing schedule on a storage device (memory) or CRM and configurations and settings for the configurations and parameters used in further training neural network, wherein the parameters are taught to include “number of layers, number of nodes per layer, number of training iterations, learning rate, etc.”, “learning rate, regularization strength, etc”, and “dropout rate” (five parameters)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, to include a CRM, storage devices, and processors tuning parameters and hyperparameters as taught by Goel in order to optimize neural network training within a computing environment while avoiding overfitting (Goel, paragraphs 0005, 0030-0036, 0053-0055, 0060, and 0083-0088).

Regarding claims 5, 12, and 17, the combination of Black, Li, and Goel teach all the claim limitations of claims 1, 9, and 15 above; and further teach determine an amount of training error using the updated weighting parameters (Black, Col. 4, line 64-Col. 6, line 16 teach adjusting the weights of the ANN for each training iteration, and further calculating error for each iteration to examine the accuracy of the updated weights); and 
the at least one logic circuit is to, in response to a determination that the amount of training error is less than a training error threshold, terminate the iterative training (Black, Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (at least one logic circuit) and memories to perform the embodiments of the disclosure, such as in Col. 5, line 26-Col. 6, line 16 and Col. 15, lines 25-48 where the ANN training is completed (terminated) when the iteration, of a plurality of iterations (iterative training), prediction error is compared to an “error goal (threshold)” and if the error is “less than or equal to the final error goal (threshold)”.).
Black at least implies at least one logic circuit (see mapping above), however Goel teaches at least one logic circuit (paragraphs 0030-0036 teach a CRM storing instructions that when executed by the processor (at least one logic circuit), as taught in paragraphs 0083-0088, stop (terminate) iterating a DNN training process (neural network training process) while determining hyperparameters when reaching the set max iterations for “early stopping” as to not “overfit” the model.). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, to include a CRM and processor implementing early stopping by monitoring iteration count and learning rate adjusting as taught by Goel in order to optimize neural network training within a computing environment while avoiding overfitting (Goel, paragraphs 0030-0036 and 0083-0088).

Regarding claims 6, 13, and 18, the combination of Black, Li, and Goel teach all the claim limitations of claims 1, 9, and 15 above; and further teach the learning rate is a first learning rate, and the at least one logic circuit is to determine a second learning rate corresponding to a subsequent epoch [or training iteration as claimed in claims 13 and 18], the second learning rate different from the first learning rate (Black, Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (at least one logic circuit) and memories to perform the embodiments of the disclosure, such as in Col. 4, line 64-Col. 5, line 44 and Col. 12, lines 4-13 where an “adaptive learning rate is calculated” continuously (first learning rate is updated) within a “preset number of iterations”, for an ANN (neural network) training method, from (based on) the error ratio following each iteration (corresponding to a subsequent epoch/iteration) to then use a new adapted learning rate value (second learning rate different from the first learning rate for the next training iteration)).
Black at least implies at least one logic circuit (see mapping above), however Goel teaches at least one logic circuit (paragraphs 0030-0036 teach a CRM storing instructions that when executed by the processor (at least one logic circuit), as taught in paragraphs 0083-0088, stop iterating a DNN training process (neural network training process) while determining hyperparameters when reaching the set max iterations for “early stopping” as to not “overfit” the model.). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, to include a CRM and processor implementing early stopping by monitoring iteration count and learning rate adjusting as taught by Goel in order to optimize neural network training within a computing environment while avoiding overfitting (Goel, paragraphs 0030-0036 and 0083-0088).

Regarding claim 8, the combination of Black, Li, and Goel teach all the claim limitations of claim 1 above; and further teach wherein the at least one logic circuit is to process an input to generate an output based on the weighting parameters (Black, Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (the at least one logic circuit) and memories to perform the embodiments of the disclosure, such as in Col. 4, line 64-Col. 5, line 44 and Col. 12, lines 4-13 where “training method of the present invention processes the input layer training patterns in the ANN to obtain output patterns” for each iteration of the ANN’s training, and the weight values of the ANN are “adjusted” for each iteration based on the “calculated error”).
Black at least implies at least one logic circuit (see mapping above), however Goel teaches at least one logic circuit (paragraphs 0030-0036 teach a CRM storing instructions that when executed by the processor (at least one logic circuit), as taught in paragraphs 0077-0079 and 0083-0088, training a DNN by updating the model weights and/or parameters to the further adjust based on input training data and the “output resuts”.). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, to include a CRM and processor implementing DNN training datasets and weight/parameter updating as taught by Goel in order to optimize neural network training within a computing environment while avoiding overfitting (Goel, paragraphs 0030-0036, 0077-0079 and 0083-0088).

Regarding claim 22, the combination of Black, Li, and Goel teach all the claim limitations of claim 9 above; and further teach wherein a first product of a first tuning parameter and a second tuning parameter is less than one (Li, paragraphs 0071, 0095, 0140-0161, and claims 2 and 13 teach performing the training “for at least one iterations” therefore interpreted as a minimum iteration value (first tuning parameter), and “end_halving_impr (which is used to determine when to terminate the training, for example, 0.001)” (second tuning parameter). Therefore, one iteration multiplied by “0.001” would be less than one (first product < one)).
Black, Li and Goel are combinable for the same rationale as set forth above with respect to claim 9.

Regarding claim 23, the combination of Black, Li, and Goel teach all the claim limitations of claim 22 above; and further teach wherein a second product of the first tuning parameter and a third tuning parameter is greater than one, the first tuning parameter and the third tuning parameter used to determine the maximum number of iterations (Li, paragraphs 0071, 0095, 0140-0161, and claims 2 and 13 teach performing the training “for at least one iterations” therefore interpreted as a minimum iteration value (first tuning parameter), and the training including a “present iteration” and “previous iteration” therefore interpreted to have a max iteration value of at least two (third tuning parameter). Therefore, one iteration multiplied by two iterations would be more than one (second product > one). Further, with a determined max and min iteration, the iterations are thus defined (the first tuning parameter and the third tuning parameter used to determine the maximum number of iterations)).
Black, Li and Goel are combinable for the same rationale as set forth above with respect to claim 9.

Regarding claim 24, the combination of Black, Li, and Goel teach all the claim limitations of claim 9 above; and further teach wherein the at least five tuning parameters are positive (Black, Figs. 4, 6, and 7A-7B depict weight values being in the upper right quadrant of a graph and thus positive (wherein the at least five tuning parameters are positive)).
Black at least implies wherein the at least five tuning parameters are positive (see mapping above), however Li teaches wherein the at least five tuning parameters are positive (paragraphs 0071, 0095, 0140-0161 teach determining training iteration criteria and stopping, including max and minimum iterations, “keep_lr_iters (which refers to the number of iterations that keep the initial learning rate)…end_halving_impr (which is used to determine when to terminate the training, for example, 0.001)”, a WER threshold taught to be a percentage (parameters are positive); wherein training iterations are well known in the art to be ≥ zero (parameters are positive)).
Black, Li and Goel are combinable for the same rationale as set forth above with respect to claim 9.

Claims 4, 11, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Black (US Patent 6269351), in view of Li et al (US Pub 20180046919) hereinafter Li, in view of Goel et al (US Pub 20160307098) hereinafter Goel, and further in view of Tee et al (US Pub 20170250932), hereinafter Tee.
Regarding claims 4, 11, and 16, the combination of Black, Li, and Goel teach all the claim limitations of claims 1, 9, and 15 above; and further teach wherein the at least one logic circuit is to store a number of epochs [count a number of iterations as claimed in claims 11 and 16] that have elapsed during the iterative training, and the at least one logic circuit is to, in response to determining that the number of epochs that have elapsed meets or exceeds a maximum number of epochs, terminate the iterative training (Black, Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (at least one logic circuit) and memories (alternative at least one logic circuit) to perform the embodiments of the disclosure, such as in Col. 4, line 64-Col. 5, line 44, Col. 11, lines 37-47, and Col. 12, lines 4-13 where during training of an ANN (training process) “[t]he value of the weights used to initialize the ANN (neural network) are adjusted based on the calculated error and the adaptive learning rate” and “fine-tuning parameters” within a “preset number of iterations (maximum/preselected number of epochs)”. This is understood that the system is able to track (count) the number of iterations from a plurality of iterations (iterative training) since it can train a neural network within “a preset number of iterations” and maintain training speed.).
Black at least implies wherein the at least one logic circuit is to store a number of epochs [count a number of iterations as claimed in claims 11 and 16] that have elapsed during the iterative training, and the at least one logic circuit is to, in response to determining that the number of epochs that have elapsed meets or exceeds a maximum number of epochs, terminate the iterative training (see mapping above), however Tee teaches wherein the at least one logic circuit is to store a number of epochs [count a number of iterations as claimed in claims 11 and 16] that have elapsed during the iterative training, and the at least one logic circuit is to, in response to determining that the number of epochs that have elapsed meets or exceeds a maximum number of epochs, terminate the iterative training (Tee, paragraphs 0199 and 0205-0207 teach a processor/memory controller (at least one logic circuit) executing CRM instructions that, in paragraphs 0237, terminates the neural network training (training process) iterations upon “meeting the error goal, or, exceeding the number of epochs or iteration steps”, thus understood to be counting epochs to determine when a max has been reached/exceeded.). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, as modified by a CRM and processor implementing early stopping by monitoring iteration count and learning rate adjusting as taught by Goel, to include iteration/epoch counter to monitor when the established number of epochs has been reached to terminate the training iterations as taught by Tee in order to increase training efficiency time by limiting the number of training iterations/epochs (Tee, paragraphs 0199, 0205-0207, and 0237).

Claims 7, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Black (US Patent 6269351), in view of Li et al (US Pub 20180046919) hereinafter Li,  in view of Goel et al (US Pub 20160307098) hereinafter Goel, and further in view of Smith (“Cyclical Learning Rates for Training Neural Networks”, 2017).
Regarding claims 7, 14, and 19, the combination of Black, Li, and Goel teach all the claim limitations of claims 1, 9, and 15 above; and further teach the at least one logic circuit is to determine whether the learning rate is greater than a learning rate threshold, and, in response to determining that the learning rate is greater than the learning rate threshold, set the learning rate to the learning rate threshold (Black, Col. 4, lines 10-20 and Col. 23, lines 26-33 teach a method and system understood to be a computer system including one or more processors (at least one logic circuit) and memories to perform the embodiments of the disclosure, such as in Col. 8, line 9-21 and Col. 19, lines 4-20 for limiting the “adaptive learning rate” to “unity” being less than 1, thus understood not permitting the learning rate to be reset above unity).
Black at least implies the at least one logic circuit is to determine whether the learning rate is greater than a learning rate threshold, and, in response to determining that the learning rate is greater than the learning rate threshold, set the learning rate to the learning rate threshold (see mapping above), however Smith teaches the at least one logic circuit is to determine whether the learning rate is greater than a learning rate threshold, and, in response to determining that the learning rate is greater than the learning rate threshold, set the learning rate to the learning rate threshold (Smith, abstract, sections 1 and 3.1 teach computing a learning rate and utilizing software understood to be executed on a computer system including one or more processors (at least one logic circuit) and memories to perform the embodiments of the disclosure, such as in sections 3.1, 3.2, and Fig. 2, for bounding a learning rate value to stay within a “maximum learning rate boundary” (determining that the learning rate is greater than the learning rate threshold, set the learning rate to the learning rate threshold)). 
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training an ANN through gradient descent and adaptive learning rate techniques, as taught by Black as modified by determining fine-tuning parameters before training a neural network and learning rate calculations as taught by Li, as modified by a CRM and processor implementing early stopping by monitoring iteration count and learning rate adjusting as taught by Goel, to include bound the computed learning rate for gradient descent techniques as taught by Smith in order to optimize computation speed efficiency when calculating a bounded learning rate (Smith, abstract, sections 1, 3.1, 3.2, and Fig. 2).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123