DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This Office Action is in response to the communication filed on 6/14/2019.
	Claims 1-20 are being considered on the merits.
Drawings
	The drawings filed on 6/14/2019 are accepted. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: The claims, as a whole, recite a series of steps for obtaining two groups of hyperparameters and comparing them in order to select a third group of hyperparameters. Therefore, the claims recite a process and falls within one of the four statutory categories of patentable subject matter. 
Claim 1 is rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of Claim 1 include obtaining groups of values of hyper parameters, obtaining a first result, obtaining a second result, obtaining a third result, and using the results and hyper parameters to select a value of hyper parameter. The final limitation pertaining to using data to select a value is a mental process which constitutes an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because, the additional limitations of claim 1, including the first four “obtaining” steps, amount to mere data gathering and constitute extra-solution activity. Moreover, the reference to a “machine learning algorithm” in claim 1 merely links the abstract idea to a particular technological environment. 
Step 2b Analysis: The additional limitations of claim 1 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the additional “obtaining” steps amount to no more than insignificant extra-solution activities to the abstract idea of selecting a hyper parameter. Additionally, the use of a machine learning algorithm merely links the abstract idea to a particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality.  

Claims 2-14 are rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of dependent claims 2-14 consist only of further specifying the type of data to be used as the “group of values” in independent claim 1. Such limitations amount to selecting a particular data source or type of data to be manipulated and does not amount to significantly more. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because, as stated above, the additional limitations of claims 2-14 amount to selecting a particular data source or type of data to be manipulated and does not amount to significantly more.  
Step 2b Analysis: The additional limitations of claims 2-14 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2 above, the additional limitations of claims 2-14 amount to selecting a particular data source or type of data to be manipulated and does not amount to significantly more.   

Claim 15 is rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of dependent claim 15 consists of training the machine learning algorithm to generate an inference model, using the inference model to generate outputs, determining results by comparing the outputs. The final limitation pertaining to “determining” a result is a mental process which constitutes an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because the “training” and “using” limitations of claim 15 merely links the abstract idea to a particular technological environment. 
Step 2b Analysis: The additional limitations of claim 15 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the reference to a machine learning algorithm and inference model merely links the abstract idea to a particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality. 

Claim 16 is rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of dependent claim 16 consists of training the machine learning algorithm to generate an inference model, using the inference model to generate outputs, providing outputs to at least one user, receiving feedback from the user, determining a result using the feedback. The final limitation pertaining to “determining” a result is a mental process which constitutes an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because the “training” and “using” limitations of claim 16 merely links the abstract idea to a particular technological environment. The additional “providing” and “receiving” steps of claim 15 are extra-solution activities that amount to mere data gathering and constitute extra-solution activity
Step 2b Analysis: The additional limitations of claim 16 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the additional “obtaining” steps amount to no more than insignificant extra-solution activities to the abstract idea. Additionally, the reference to a machine learning algorithm and inference model merely links the abstract idea to a particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality. 

Claim 17 is rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of dependent claim 17 consists of determining an embedding and using the determined embedding to select a value. The first limitation pertaining to “determining” is a mathematical concept which constitutes an abstract idea. The second limitation pertaining to “using” the embedding to select a value is a mental process which constitutes an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because the “determining” and “using” limitations of claim 17 are only further modified by reference to a “machine learning algorithm” which merely links the abstract idea to a particular technological environment. 
Step 2b Analysis: The additional limitations of claim 17 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the additional element of a “machine learning algorithm” merely links the abstract idea to a particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality. 

Claim 18 is rejected under 35 U.S.C 101. Dependent claim 18 does not cure the deficiencies noted in the rejection of independent claim 1 nor dependent claim 17. Where the limitation is determining a preferable result and selecting a hyperparameter, such limitations amounts to mere data gathering and does not amount to significantly more. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of dependent claim 18 consists of determining which result is preferable, and selecting one value of a hyperparameter from the first group or selecting a hyperparameter value from the second or third groups. All three limitations pertaining to “determining” and “selecting” are mental processes which constitute an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because the “determining” and “using” limitations of claim 18 are only further modified by reference to a “machine learning algorithm” which merely links the abstract idea to a particular technological environment. 
Step 2b Analysis: The additional limitations of claim 18 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the additional element of a “machine learning algorithm” merely links the abstract idea to a particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality. 

Claim 19 is rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of claim 19 include using a processor to obtain groups of values of hyper parameters, obtain a first result, obtain a second result, obtain a third result, and use the results and hyper parameters to select a value of hyper parameter. The final limitation pertaining to using data to select a value is a mental process which constitutes an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because, the additional limitations of claim 19, including the first four “obtaining” steps, amount to mere data gathering and constitute extra-solution activity. Moreover, the reference of to a “processor” and a “machine learning algorithm” in claim 19 merely links the abstract idea to a generic computer part and a particular technological environment. 
Step 2b Analysis: The additional limitations of claim 19 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the additional “obtaining” steps amount to no more than insignificant extra-solution activities to the abstract idea of selecting a hyper parameter. Additionally, the use of a processor and machine learning algorithm merely links the abstract idea to a generic computer part and particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality.  

Claim 20 is rejected under 35 U.S.C 101. 
Step 2 Prong 1 Analysis: Based on the claims being determined to be within of the four categories (Step 1 above), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). The claim limitations of Claim 1 include obtaining groups of values of hyper parameters, obtaining a first result, obtaining a second result, obtaining a third result, and using the results and hyper parameters to select a value of hyper parameter. The final limitation pertaining to using data to select a value is a mental process which constitutes an abstract idea. 
Step 2a Prong 2 Analysis: The judicial exception of the abstract idea is not integrated into a practical application because, the additional limitations of claim 1, including the first four “obtaining” steps, amount to mere data gathering and constitute extra-solution activity. Moreover, the reference to a “machine learning algorithm” in claim 1 merely links the abstract idea to a particular technological environment. 
Step 2b Analysis: The additional limitations of claim 1 do not include additional elements that are sufficient to amount to significantly more. As discussed above in Step 2a Prong 2, the additional “obtaining” steps amount to no more than insignificant extra-solution activities to the abstract idea of selecting a hyper parameter. Additionally, the use of a machine learning algorithm merely links the abstract idea to a particular technological environment or is otherwise well-understood, routine, and conventional activity that is recited at a high level of generality.  


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-2, 6-8, 11, and 14-15, and 17-20 are rejected under 35 U.S.C. 102 as being anticipated by Koutsoukas, et. al (“Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data”, 2017, Journal of Cheminformatics, hereinafter “Koutsoukas”).

Regarding claims 1, 19, and 20, Koutsoukas teaches a method, system, and a non-transitory computer readable medium, comprising: 
obtaining a first group of values of hyper parameters, a second group of values of hyper parameters and a third group of values of hyper parameters, the second group of values of hyper parameters differs from the first group of values of hyper parameters, and the third group of values of hyper parameters differs from the first group of values of hyper parameters and the second group of values of hyper parameters (Koutsoukas Pg. 11 and fig. 3: “The hyper-parameters selected and explored for DNNs were:...(a) Activation functions compared the rectified linear units (ReLU), Sigmoid (“sigm”) and Tanh (“tanh”), Fig. 1c illustrates the shapes of the functions. (b) Number of neurons in each hidden layer (5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500), (c) learning rate “η” of (1, 10-1, 10-2, 10-3, 10-4), (d) number of hidden layers up to 4, (e) regularization technique applied”; examiner notes that the broadest reasonable interpretation of “obtaining” to mean procure or acquire by any means such as by selecting); 
obtaining a first result of training a machine learning algorithm using a first plurality of training examples and the first group of values of hyper parameters (Koutsoukas, fig. 3: “Effect of the hyper-parameters (i) number of hidden layers, (ii) number of neurons and (iii) dropout regularization on the performance of DNNs measured by MCC as evaluation metric. DNN configuration A shows results obtained by DNN with a single hidden layer and 10 neurons, ReLU activation function and no regularization averaged over the seven activity dataset, B a two hidden layered DNN with 500 neurons in each layer, ReLU activation function and no regularization, C two hidden layers with 3000 neurons per hidden layer and dropout regularization (0% for the input and 50% for hidden layers),”);
obtaining a second result of training the machine learning algorithm using a second plurality of training examples and the second group of values of hyper parameters (Koutsoukas, fig. 3: “Effect of the hyper-parameters (i) number of hidden layers, (ii) number of neurons and (iii) dropout regularization on the performance of DNNs measured by MCC as evaluation metric. DNN configuration A shows results obtained by DNN with a single hidden layer and 10 neurons, ReLU activation function and no regularization averaged over the seven activity dataset, B a two hidden layered DNN with 500 neurons in each layer, ReLU activation function and no regularization, C two hidden layers with 3000 neurons per hidden layer and dropout regularization (0% for the input and 50% for hidden layers),”); 
obtaining a third result of training the machine learning algorithm using a third plurality of training examples and the third group of values of hyper parameters (Koutsoukas, fig. 3: “Effect of the hyper-parameters (i) number of hidden layers, (ii) number of neurons and (iii) dropout regularization on the performance of DNNs measured by MCC as evaluation metric. DNN configuration A shows results obtained by DNN with a single hidden layer and 10 neurons, ReLU activation function and no regularization averaged over the seven activity dataset, B a two hidden layered DNN with 500 neurons in each layer, ReLU activation function and no regularization, C two hidden layers with 3000 neurons per hidden layer and dropout regularization (0% for the input and 50% for hidden layers),”); 
using the first result, the second result, the third result, the first group of values of hyper parameters, the second group of values of hyper parameters and the third group of values of hyper parameters to select at least one value of a hyper parameter for a prospective training of the machine learning algorithm (Koutsoukas, fig. 3: “Effect of the hyper-parameters (i) number of hidden layers, (ii) number of neurons and (iii) dropout regularization on the performance of DNNs measured by MCC as evaluation metric… D two hidden layers with 3000 neurons per hidden layer and dropout regularization (20% for the input and 50% for hidden layers)”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed).
Regarding claim 2, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
the first group of values of hyper parameters comprises a first learning rate value, the second group of values of hyper parameters comprises a second learning rate value, the third group of values of hyper parameters comprises a third learning rate value, and further comprising using the first result, the second result, the third result, the first learning rate value, the second learning rate value and the third learning rate value to select a learning rate value for the prospective training of the machine learning algorithm (Koutsoukas Pg. 11 and fig. 3: “The hyper-parameters selected and explored for DNNs were:...(a) Activation functions compared the rectified linear units (ReLU), Sigmoid (“sigm”) and Tanh (“tanh”), Fig. 1c illustrates the shapes of the functions. (b) Number of neurons in each hidden layer (5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500), (c) learning rate “η” of (1, 10-1, 10-2, 10-3, 10-4), (d) number of hidden layers up to 4, (e) regularization technique applied”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed); 
Regarding claim 6, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
the first group of values of hyper parameters comprises a first number of layers, the second group of values of hyper parameters comprises a second number of layers, the third group of values of hyper parameters comprises a third number of layers, and further comprising using the first result, the second result, the third result, the first number of layers, the second number of layers and the third number of layers to select a number of layers for the prospective training of the machine learning algorithm (Koutsoukas Pg. 11 and fig. 3: “The hyper-parameters selected and explored for DNNs were:...(a) Activation functions compared the rectified linear units (ReLU), Sigmoid (“sigm”) and Tanh (“tanh”), Fig. 1c illustrates the shapes of the functions. (b) Number of neurons in each hidden layer (5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500), (c) learning rate “η” of (1, 10-1, 10-2, 10-3, 10-4), (d) number of hidden layers up to 4, (e) regularization technique applied”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed)


Regarding claim 7, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
the first group of values of hyper parameters comprises a first number of artificial neurons, the second group of values of hyper parameters comprises a second number of artificial neurons, the third group of values of hyper parameters comprises a third number of artificial neurons, and further comprising using the first result, the second result, the third result, the first number of artificial neurons, the second number of artificial neurons and the third number of artificial neurons to select a number of artificial neurons for the prospective training of the machine learning algorithm (Koutsoukas [] and fig. 4: “The hyper-parameters selected and explored for DNNs were:...(a) Activation functions compared the rectified linear units (ReLU), Sigmoid (“sigm”) and Tanh (“tanh”), Fig. 1c illustrates the shapes of the functions. (b) Number of neurons in each hidden layer (5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500), (c) learning rate “η” of (1, 10-1, 10-2, 10-3, 10-4), (d) number of hidden layers up to 4, (e) regularization technique applied”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed)

Regarding claim 8, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
the first group of values of hyper parameters comprises a first selection of an activation function, the second group of values of hyper parameters comprises a second selection of an activation function, the third group of values of hyper parameters comprises a third selection of an activation function, and further comprising using the first result, the second result, the third result, the first selection of an activation function, the second selection of an activation function and the third selection of an activation function to select an activation function for the prospective training of the machine learning algorithm (Koutsoukas, pg. 11 and fig. 3 “The hyper-parameters selected and explored for DNNs were:...(a) Activation functions compared the rectified linear units (ReLU), Sigmoid (“sigm”) and Tanh (“tanh”), Fig. 1c illustrates the shapes of the functions. (b) Number of neurons in each hidden layer (5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500), (c) learning rate “η” of (1, 10-1, 10-2, 10-3, 10-4), (d) number of hidden layers up to 4, (e) regularization technique applied”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.)

Regarding claim 11, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
the first group of values of hyper parameters comprises a first selection of a kernel function, the second group of values of hyper parameters comprises a second selection of a kernel function, the third group of values of hyper parameters comprises a third selection of a kernel function, and further comprising using the first result, the second result, the third result, the first selection of a kernel function, the second selection of a kernel function and the third selection of a kernel function to select a kernel function for the prospective training of the machine learning algorithm  (Koutsoukas, fg. 4: “The difference ranged on average from 0.149 MCC units between DNN and NB, 0.092 DNN and kNN, 0.052 DNN and SVM with linear kernel, 0.021 DNN and RF and 0.009 DNN and SVM with ‘rbf’ kernel”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.)

Regarding claim 14, Koutsoukas teaches the method of claim 1 (above). [] further teaches: 
the first group of values of hyper parameters comprises a first sampling criterion used to sample the first plurality of training examples from a base group of training examples, the second group of values of hyper parameters comprises a second sampling criterion used to sample the second plurality of training examples from the base group of training examples, the third group of values of hyper parameters comprises a third sampling criterion used to sample the third plurality of training examples from a base group of training examples, and further comprising using the first result, the second result, the third result, the first sampling criterion, the second sampling criterion and the third sampling criterion to select a sampling criterion for sampling a plurality of training examples from the base group of training examples for the prospective training of the machine learning algorithm (Koutouskas, pg. 9: “In total seven diverse bio activity classes were selected and used in the study: (a) Carbonic Anhydrase II (ChEMBL205), a protein lyase, (b) Cyclin-dependent kinase 2 (CHEMBL301), a protein kinase, (c) ether-a-go-go-related gene potassium channel 1 (HERG) (CHEMBL240), a voltage-gated ion channel, (d) Dopamine D4 receptor (CHEMBL219), a monoamine GPCR, (e) Coagulation factor X (CHEMBL244), a serine protease, (f) Cannabinoid CB1 receptor (CHEMBL218), a lipid-like GPCR and (g) Cytochrome P450 19A1 (CHEMBL1978), a cytochrome P450”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.)

Regarding claim 15, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
training the machine learning algorithm using the first plurality of training examples and the first group of values of hyper parameters to generate a first inference model (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance achieved by 6 different network structures over 7 different datasets).
using the first inference model and a plurality of testing examples to generate a first plurality of outputs (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance achieved by 6 different network structures over 7 different datasets).
comparing the first plurality of outputs and a plurality of desired results to determine the first result (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance achieved by 6 different network structures over 7 different datasets).

Regarding claim 17, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
determining an embedding in a mathematical space of at least a value of the first group of values of hyper parameters, a value of the second group of values of hyper parameters and a value of the third group of values of hyper parameters (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance as a mathematical computation achieved by 6 different network structures over 7 different datasets).
using the determined embedding of the values in the mathematical space to select the at least one value of the hyper parameter for the prospective training of the machine learning algorithm (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance as a mathematical computation achieved by 6 different network structures over 7 different datasets. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.)

Regarding claim 18, Koutsoukas teaches the method of claim 1 (above). Koutsoukas further teaches: 
determining whether the first result is preferable to the second result and the third result (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance as a mathematical computation achieved by 6 different network structures over 7 different datasets. Examiner notes that the broadest reasonable interpretation of “determining” means to deduce as in the result of a mathematical calculation as provided in table 4.)
in response to the determination that the first result is preferable to the second result and the third result, selecting the at least one value of the hyper parameter for the prospective training of the machine learning algorithm to be closer to the first group of values of hyper parameters than to the second group of values of hyper parameters and to the third group of values of hyper parameters according to the embedding in the mathematical space (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance as a mathematical computation achieved by 6 different network structures over 7 different datasets. Examiner notes that the broadest reasonable interpretation of “selecting the at least one value” includes selecting all values for prospective training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.) 
in response to the determination that the first result is not preferable to the second result and the third result, selecting the at least one value of the hyper parameter for the prospective training of the machine learning algorithm to be farther from the first group of values of hyper parameters than from at least one of the second group of values of hyper parameters and the third group of values of hyper parameters according to the embedding in the mathematical space (Koutsoukas, pg. 8, Table 4: “Performance achieved by DNN, NB, kNN, RF and SVM measured using MCC as evaluation metric.” Table 4 illustrates performance as a mathematical computation achieved by 6 different network structures over 7 different datasets. Examiner notes that the broadest reasonable interpretation of “determining” means to deduce as in the result of a mathematical calculation as provided in table 4. Examiner additionally notes that the broadest reasonable interpretation of “selecting the at least one value” includes selecting all values for prospective training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3-5, 9, 12-13 and 16 are rejected under 35 U.S.C. 103. 

Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Koutsoukas in view of Li, et. al. (“Adaptive Correlation Model for Visual Tracking Using Keypoints Matching and Deep Convolutional Feature”, 23 Feb. 2018, MDPI, Sensors 2018, 18, 653; hereinafter “Li”).

Regarding claim 3, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose:
the first group of values of hyper parameters comprises a first selection of a learning rate update method, the second group of values of hyper parameters comprises a second selection of a learning rate update method, the third group of values of hyper parameters comprises a third selection of a learning rate update method, and further comprising using the first result, the second result, the third result, the first selection of a learning rate update method, the second selection of a learning rate update method and the third selection of a learning rate update method to select a learning rate update method for the prospective training of the machine learning algorithm
However, Li teaches: 
the first group of values of hyper parameters comprises a first selection of a learning rate update method, the second group of values of hyper parameters comprises a second selection of a learning rate update method, the third group of values of hyper parameters comprises a third selection of a learning rate update method, and further comprising using the first result, the second result, the third result, the first selection of a learning rate update method, the second selection of a learning rate update method and the third selection of a learning rate update method to select a learning rate update method for the prospective training of the machine learning algorithm (Li, pg. 11-12, sec. 4.3.1 and Figure 8: “We compared our method with the updating method using peak to sidelobe ratio (PSR) and updating method using fixed learning rate” “Experimental results are shown in Figure 8. ACMD is our proposed method. PsrUpdate is the same as ACMD, except it uses PSR as an update criterion. No update indicates that it uses a fixed learning rate update method.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Li teaches algorithms and models for visual tracking. One of ordinary skill would have motivation to combine the teachings of Li into Koutsoukas because Li shows that different update methods results in greater advantages (Li, pg. 11-12). 

Regarding claim 12, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose:
the first group of values of hyper parameters comprises a first selection of a distance measure, the second group of values of hyper parameters comprises a second selection of a distance measure, the third group of values of hyper parameters comprises a third selection of a distance measure, and further comprising using the first result, the second result, the third result, the first selection of a distance measure, the second selection of a distance measure and the third selection of a distance measure to select a distance measure for the prospective training of the machine learning algorithm.
However, Li teaches: 
the first group of values of hyper parameters comprises a first selection of a distance measure, the second group of values of hyper parameters comprises a second selection of a distance measure, the third group of values of hyper parameters comprises a third selection of a distance measure, and further comprising using the first result, the second result, the third result, the first selection of a distance measure, the second selection of a distance measure and the third selection of a distance measure to select a distance measure for the prospective training of the machine learning algorithm (Li, pg. 6: “We use the Hamming distance to define the similarity between kcurr and kt-1… where i is the index of sub-element in the descriptor and vmax is the maximum Hamming distance.”; examiner notes that the broadest reasonable interpretation of a distance measure includes a score that summarizes the relative difference between two objects such as that each individual scores is a different distance measure. Examiner additionally notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Li teaches algorithms and models for visual tracking. One of ordinary skill would have motivation to combine the teachings of Li into Koutsoukas because Li addresses the issue of continuous learning based on corrupt data by integrating a distance measure to define similarity (Li, pg. 6-7, sec. 3.3). 

Claims 4-5, 9, 13 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Koutsoukas in view of Liu, et. al. (“Deep Hyperspherical Learning”, 2017, NIPS; hereinafter “Liu”).

Regarding claim 4, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose:
the first group of values of hyper parameters comprises a first batch size, the second group of values of hyper parameters comprises a second batch size, the third group of values of hyper parameters comprises a third batch size, and further comprising using the first result, the second result, the third result, the first batch size, the second batch size and the third batch size to select a batch size for the prospective training of the machine learning algorithm.
However, Liu teaches: 
the first group of values of hyper parameters comprises a first batch size, the second group of values of hyper parameters comprises a second batch size, the third group of values of hyper parameters comprises a third batch size, and further comprising using the first result, the second result, the third result, the first batch size, the second batch size and the third batch size to select a batch size for the prospective training of the machine learning algorithm (Liu, pg. 9, fig. 5: Figure 5 shows graphs for mini-batch size = 4, mini-batch size = 8, mini-batch size = 16, and mini-batch size = 32; examiner notes that the broadest reasonable interpretation of “batch size” includes entire batch of data or mini-batches of data).  
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Liu teaches a deep learning framework based on hyperspherical convolutions as an answer to various issues in neural network training. One of ordinary skill would have motivation to combine the teachings of Liu into Koutsoukas because Liu addresses the additional training issue of when batch sizes are too small (Liu, pg. 8, sec. 4.4) 

Regarding claim 5, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose:
the first group of values of hyper parameters comprises a first network structure, the second group of values of hyper parameters comprises a second network structure, the third group of values of hyper parameters comprises a third network structure, and further comprising using the first result, the second result, the third result, the first network structure, the second network structure and the third network structure to select a network structure for the prospective training of the machine learning algorithm
However, Liu teaches: 
the first group of values of hyper parameters comprises a first network structure, the second group of values of hyper parameters comprises a second network structure, the third group of values of hyper parameters comprises a third network structure, and further comprising using the first result, the second result, the third result, the first network structure, the second network structure and the third network structure to select a network structure for the prospective training of the machine learning algorithm (Liu, pg. 7 and Table 2: “We evaluate all the proposed SphereConv operators with the same architecture of different layers and a totally different architecture (ResNet)” Table 2 shows 6 different network architectures including CNN-3, CNN9, CNN-18, CNN-45, CNN-60, and ResNet32. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Liu teaches a deep learning framework based on hyperspherical convolutions as an answer to various issues in neural network training. One of ordinary skill would have motivation to combine the teachings of Liu into Koutsoukas because Liu addresses the issue of how operators may differ in performance when used in different network architectures, i.e. with different network structures (Liu, pg. 7) 

Regarding claim 9, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose:
the first group of values of hyper parameters comprises a first selection of a loss function, the second group of values of hyper parameters comprises a second selection of a loss function, the third group of values of hyper parameters comprises a third selection of a loss function, and further comprising using the first result, the second result, the third result, the first selection of a loss function, the second selection of a loss function and the third selection of a loss function to select a loss function for the prospective training of the machine learning algorithm
However, Liu teaches: 
the first group of values of hyper parameters comprises a first selection of a loss function, the second group of values of hyper parameters comprises a second selection of a loss function, the third group of values of hyper parameters comprises a third selection of a loss function, and further comprising using the first result, the second result, the third result, the first selection of a loss function, the second selection of a loss function and the third selection of a loss function to select a loss function for the prospective training of the machine learning algorithm (Liu, pg. 7, Table 1: “Table 1: Classification accuracy (%) with different loss functions” Table 1 shows 8 different loss functions including Original Softmax, Sigmoid(0.1) W-Softmax, Sigmoid (0.3) W-Softmax, etc. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Liu teaches a deep learning framework based on hyperspherical convolutions as an answer to various issues in neural network training. One of ordinary skill would have motivation to combine the teachings of Liu into Koutsoukas because Liu addresses the issue of how different loss functions may affect performance of a network (Liu, pg. 7) 

Regarding claim 13, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose: 
the first group of values of hyper parameters comprises a first selection of a stopping condition, the second group of values of hyper parameters comprises a second selection of a stopping condition, the third group of values of hyper parameters comprises a third selection of a stopping condition, and further comprising using the first result, the second result, the third result, the first selection of a stopping condition, the second selection of a stopping condition and the third selection of a stopping condition to select a stopping condition for the prospective training of the machine learning algorithm.
However, Liu teaches: 
the first group of values of hyper parameters comprises a first selection of a stopping condition, the second group of values of hyper parameters comprises a second selection of a stopping condition, the third group of values of hyper parameters comprises a third selection of a stopping condition, and further comprising using the first result, the second result, the third result, the first selection of a stopping condition, the second selection of a stopping condition and the third selection of a stopping condition to select a stopping condition for the prospective training of the machine learning algorithm.  (Liu, pg. 6-7, sec. 4.1: “For CIFAR-10 and CIFAR-100, we use the ADAM, starting with the learning rate 0.001. The batch size is 128 if not specified. The learning rate is divided by 10 at 34K, 54K iterations and the training stops at 64K. For both A-Softmax and GA-Softmax loss we use m=4. For Imagenet-2012, we use the SGD with momentum 0.9. The learning rate starts with 0.1, and is divided by 10 at 200K and 375K iterations. The training stops at 550K iteration”. Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed.). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Liu teaches a deep learning framework based on hyperspherical convolutions as an answer to various issues in neural network training. One of ordinary skill would have motivation to combine the teachings of Liu into Koutsoukas because Liu addresses the issue of how different stopping conditions may affect performance of a network (Liu, pg. 7) 

Regarding claim 16, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose: 
training the machine learning algorithm using the first plurality of training examples and the first group of values of hyper parameters to generate a first inference model
using the first inference model and a plurality of testing examples to generate a first plurality of outputs 
providing information based on the first plurality of outputs to at least one user 
receiving one or more feedbacks related to the information based on the first plurality of outputs from the at least one user 
using the one or more feedbacks related to the information based on the first plurality of outputs to determine the first result 

However, Liu teaches: 
training the machine learning algorithm using the first plurality of training examples and the first group of values of hyper parameters to generate a first inference model (Liu, pg. 6-7, sec. 4.1 and Table 1: “We will first perform comprehensive ablation study and exploratory experiments for the proposed SphereNets, and then evaluate the SphereNets on image classification” Table 1 shows machine learning results of 8 different configurations in terms of accuracy.)
using the first inference model and a plurality of testing examples to generate a first plurality of outputs (Liu, pg. 7, sec. 4.1 and Table 1: “From the results in Table 1, one can observe that the SphereConv operators consistently outperforms the original convolutional operator.” Table 1 shows machine learning results of 8 different configurations in terms of accuracy. Examiner notes the broadest reasonable interpretation of an “inference model” is a machine learning model applied to data to produce an output). 
providing information based on the first plurality of outputs to at least one user (Liu, Table 1: Table 1 shows machine learning results of 8 different configurations in terms of accuracy. Examiner notes that the broadest reasonable interpretation of “user” includes individuals implementing and executing the training such that the user is provided the training output information in the form of training results). 
receiving one or more feedbacks related to the information based on the first plurality of outputs from the at least one user (Liu, Pg. 7, sec. 42: “From the results in Table 1, one can observe that the SphereConv operators consistently outperforms the original convolutional operator” Table 1 shows machine learning results of 8 different configurations in terms of accuracy. Examiner notes that the broadest reasonable interpretation of “user” includes individuals implementing and executing the training such that the user is provided the training output information in the form of training results. Examiner additionally notes that the broadest reasonable interpretation of “feedbacks” means the transmission of evaluative information including regarding the performance of an operator).
using the one or more feedbacks related to the information based on the first plurality of outputs to determine the first result (Liu, Pg. 7, sec. 42: “From the results in Table 1, one can observe that the SphereConv operators consistently outperforms the original convolutional operator” Table 1 shows machine learning results of 8 different configurations in terms of accuracy. Examiner notes that the broadest reasonable interpretation of “determine” includes ascertaining, from the performance, that one operator performed better in view of the feedback). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Li into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Liu teaches a deep learning framework based on hyperspherical convolutions as an answer to various issues in neural network training. One of ordinary skill would have motivation to combine the teachings of Liu into Koutsoukas in order to integrate evaluation and analysis into neural network model training to achieve better results (Liu, pg. 7). 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Koutsoukas in view of Kevin, et. al. (“The Effect of Hyperparameter Choice on ReLU and SELU Activation Function”, 31 Dec 2017, International Journal of Advanced Smart Convergence, Vol. 6, Issue 4).

Regarding claim 10, Koutsoukas teaches the method of claim 1 (above). Koutsoukas does not explicitly disclose: 
the first group of values of hyper parameters comprises a first selection of an initialization method, the second group of values of hyper parameters comprises a second selection of an initialization method, the third group of values of hyper parameters comprises a third selection of an initialization method, and further comprising using the first result, the second result, the third result, the first selection of an initialization method, the second selection of an initialization method and the third selection of an initialization method to select an initialization method for the prospective training of the machine learning algorithm
However, Kevin teaches: 
the first group of values of hyper parameters comprises a first selection of an initialization method, the second group of values of hyper parameters comprises a second selection of an initialization method, the third group of values of hyper parameters comprises a third selection of an initialization method, and further comprising using the first result, the second result, the third result, the first selection of an initialization method, the second selection of an initialization method and the third selection of an initialization method to select an initialization method for the prospective training of the machine learning algorithm (Kevin, pg. 73, abstract: “This paper evaluates the effect of Xavier and He initialization on Convolution Neural network with ReLU or SELU as activation function. The result can be seen in Table 1 and Table 2. Additionally, this paper also evaluates the initialization method that is used in Self-Normalizing Neural Networks (SNN)”; Examiner notes that the broadest reasonable interpretation of “using” means to employ or utilize in any way, including as baseline for selecting the same or different value in another iteration of training. Examiner additionally notes that the broadest reasonable interpretation of “prospective training” means planned training, including such planned training that is subsequently executed).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kevin into Koutsoukas. Koutsoukas teaches comparison and performance of hyper-parameters in the context of modeling bioactivity data; Kevin teaches selection of hyperparameters and their effects on activation functions. One of ordinary skill would have motivation to combine the teachings of Kevin into Koutsoukas in order to evaluate initialization methods which may have an effect on successful convergence (Kevin, pg. 74, sec. 3.1). 

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Delong (WO 2016/145516 A1) teaches a system for training a neural network with variance of hyper-parameters
Julian, et. al. (WO 2016/122787 A1) teaches a method for selecting hyper-parameters for training deep neural networks via database.
Kobayashi, et. al. (US 20170061329 A1) teaches a machine learning management device that executes a plurality of algorithms using training data. 
Loosli, et. al. (“Comments on the “Core Vector Machines: Fast SVM Training on Very Large Data Sets”, 2007, Journal of Machine Learning Research 8, pp. 291-301) teaches how different hyper-parameters produce different behaviors in Core Vector Machines and Support Vector Machines

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SALLY T. NGUYEN whose telephone number is (571)272-3406. The examiner can normally be reached M-F 9:00am - 5:00pm Eastern Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amir Mehrmanesh can be reached on (571) 270-3351. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/STN/Examiner, Art Unit 4163                                                                                                                                                                                                        
/VIKER A LAMARDO/Primary Examiner, Art Unit 2126