Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-5, 7-12, 14-18, and 20 were amended, Claims 1-20 are pending and have been examined.

Response to Amendment
Upon further review, the objection regarding the specification is withdrawn.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot in light of the new rejection as listed below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 


Claims 1-6, 8-13, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Bilenko (US 2014/0344193 Al), in view of Akaike (Likelihood and the Bayes procedure – 1998), further in view of Xu (US 2019 / 0065892 A1), and further in view of Bozdogan (MODEL SELECTION AND AKAIKES INFORMATION CRITERION (AIC) – 1987).

Regarding claim 1, Bilenko teaches A method comprising: Identifying, by a processing device, a group of hyperparameters for configuring a machine-learning model ([0024] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner considers the numerous respective hyperparameters of the learning algorithm of the learning component to be the group of hyper parameters for configuring a machine learning model).
determining, by the processing device, a number of hyperparameters in the group of hyperparameters ([0024] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner considers selecting numerous 
adjusting one or more values of one or more hyperparameters in the group of hyperparameters to generate a respective version of the machine-learning model among the multiple versions of the machine-learning model ([0008-0009] For each candidate hyper-parameter configuration, the learning algorithm learns a respective predictive model based upon a set of training data, such that parameters of a predictive model are optimized based upon a respective hyper-parameter configuration of the learning algorithm and the set of training data. The examiner notes that Bilenko teaches the optimization of parameters based upon the configuration of hyperparameters).
Bilenko however, fails to explicitly teach determining, by the processing device, descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined. Bilenko also fails to explicitly teach determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model. Bilenko also fails to explicitly teach training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model. Bilenko also fails to explicitly teach determining, by the processing device, that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; and executing, by the processing device, the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value.
On the other hand, Akaike teaches determining, by the processing device, descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined ([page 0322, Para. 5] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that Akaike teaches the calculation of ABIC for multiple versions of a model to compare them. The examiner also notes that Akaike’s use of “number of adjusted hyperparameters” means that there are at least two versions (or more, depending on the number of hyperparameter adjustments) of the same model, one version with unadjusted hyperparameters, and a second version (or more) with the adjusted hyperparameters. The examiner also notes that Bilenko and Akaike are both considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko’s learning model to incorporate determining, by the processing device, descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined as taught by Akaike [page 0322, Para. 5] to take advantage of the Bayesian approach over the conventional statistics [Page 321, Para. 2]).
Furthermore, Akaike teaches determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model. ([Page 0322, Para. 5] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that Akaike teaches the calculation of ABIC for multiple models to compare them.  The examiner notes that Akaike teaches the use of the number of adjusted hyperparameters and maximum marginal likelihood of multiple versions of a model to create their respective ABIC to compare them with each hyperparameter adjustment causing a respective ABIC value to be calculated. The examiner also notes that Akaike’s use of “number of adjusted hyperparameters” means that there are at least two versions (or more, depending on the number of hyperparameter adjustments) of the same model, one version with unadjusted hyperparameters, and a second version (or more) with the adjusted hyperparameters. The examiner also notes that Bilenko and Akaike are both considered to be analogous because they are in the same field of computational determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model as taught by Akaike [Page 0322, Para. 5] to take advantage of the Bayesian approach over the conventional statistics [Page 321, Para. 2]).
Furthermore, Xu teaches training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model ([0042] The models generated at block 204 , Gskin and Gnon-skin using Eqn. 1 may represent the practical color distributions of image pixels in a skin dominant region (e.g., a facial region, a hand region, or the like) and a non–skin dominant region (e.g., a background region or the like). Such models may be used to determine a representative discriminative skin likelihood function P(i) as illustrated in FIG.2. The likelihood function P(i) may be provided to block 206 where pixel-wise skin detection is performed with real confidence. The examiner notes that Xu teaches in [0054] the use of multiple skin models each of which will require the determination of multiple likelihood functions. The examiner also notes that Xu teaches in [0065] the use of online training to obtain the likelihood function. The examiner also notes that Bilenko/Akaike and Xu are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model as taught by Xu [0054] to enhance the robustness and efficiency of image detection technologies [0001]).
Furthermore, Bozdogan teaches determining, by the processing device, that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; ([Page 356, Para. 4] This procedure is called the minimum AIC procedure and the model with the minimum AIC is called the minimum AIC estimate (MAICE) and is chosen to be the best model. The examiner notes that Bilenko/Akaike/Xu and Bozdogan are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu)’s learning model to incorporate determining, by the processing device, that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; as taught by Bozdogan [Page 356, Para. 4] to select the best model with the least complexity, or equivalently, the highest information gain [[Page 356, Para. 4]).
Furthermore, Bozdogan teaches and executing, by the processing device, the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value; ([Page 352, Para. 2] Following Akaike (1973), the problem of statistical model identification can be formulated as the problem of selecting a model f (x I Ok) based on n observations. The examiner notes that Bozdogan teaches the selection of the best statistical models (as shown above) to perform statistical data analysis. The examiner also notes that Bilenko/Akaike/Xu and Bozdogan are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu)’s learning model to incorporate and executing, by the processing device, the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value; as taught by Bozdogan [Page 352, Para. 2] to select a best fit statistical model to perform statistical data analysis).

Regarding claim 2, Bilenko/Akaike/Xu/Bozdogan teaches The method of claim 1, further comprising determining the respective descriptor value for the respective version of the machine-learning model by: multiplying a first constant value by the number of hyperparameters to determine a first value; multiplying a second constant value by a logarithm of a maximum value of the respective likelihood function for the respective version of the machine-learning model to determine a second value. determining the respective descriptor value for the respective version of the machine-learning model by adding the second value to the first value ([Page 0322, Para. 5 on Akaike] The general definition of ABIC of a 

Regarding claim 3 Bilenko/Akaike/Xu/Bozdogan teaches The method of claim 2, wherein the one or more hyperparameters include every hyperparameter in the group of hyperparameters. ([0024 on Bilenko] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner notes that Bilenko teaches numerous hyperparameters of the learning algorithm which could include every hyperparameter).

Regarding claim 4 Bilenko/Akaike/Xu/Bozdogan teaches The method of claim 3, wherein the one or more hyperparameters include two or more hyperparameters. ([0024 on Bilenko] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner notes that Bilenko teaches numerous hyperparameters of the learning algorithm which means two or more hyperparameter).

Regarding claim 5 Bilenko/Akaike/Xu/Bozdogan teaches The method of claim 1, wherein at least one hyperparameter in the group of hyperparameters does not affect a topology of the machine-learning model ([0002 on Bilenko] The learning algorithm itself has parameters, which are referred to herein as hyper-parameters. Exemplary hyper-parameters can include a learning rate of the learning algorithm, a regularization coefficient of the learning algorithm, preprocessing options, structural properties of a predictive model that is to be learned ( e.g., a maximum number of leaves in a regression tree), etc.).

Regarding claim 6 Bilenko/Akaike/Xu/Bozdogan teaches The method of claim 1, wherein the machine-learning model is a first type of machine-learning model prior to executing the particular version of the machine-learning model to perform the task: determining another descriptor value for a second type of machine-learning model that is different from the first type of machine-learning model [(Page 353, Para. 3 on Bozdogan] Proposition I: Akaike's information criterion (AIC): Let {Mk: k = 1, 2, ... , K} be a set of competing models indexed by k = 1, 2, ... , K. Then the criterion AIC(k) = - 2 log L(θk) + 2k, which is minimized to choose a model Mk over the set of models is a natural sample estimator of twice the negentropy, 2E[J(θ*; θk)], or minus twice the expected log likelihood, - 2E[log f (X I θk)], of the true distribution with respect to a model with the parameters determined by the method of maximum likelihood. The examiner notes that Bozdogan teaches the calculation of AIC for multiple models).  
determining that the lowest descriptor value associated with the first type of machine-learning model is lower than the other descriptor value associated with the second type of machine-learning model ([Page 356, Para. 4 on Bozdogan] This procedure is called the minimum AIC procedure and the model with the minimum AIC is called the minimum AIC estimate (MAICE) and is chosen to be the best model. The examiner notes that Bozdogan [Page 353, Para. 3] teaches a procedure called the minimum AIC procedure that is used to select the one model among multiple models with the lowest AIC as being the best model).
based on determining that the lowest descriptor value is lower than the other descriptor value, selecting the particular version of the machine-learning model for performing the task. ([page 357, Para. 6 on Bozdogan] Without violating Akaike's principles, using the established results in mathematical statistics, we improve and extend AIC analytically in two ways. These extensions make AIC asymptotically 

Regarding claim 8, Bilenko teaches A non-transitory computer-readable medium comprising program code that is executable by a processing device for causing the processing device to Identify a group of hyperparameters for configuring a machine-learning model. ([0024] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner considers the numerous respective hyperparameters of the learning algorithm of the learning component to be the group of hyper parameters for configuring a machine learning model).
determine a number of hyperparameters in the group of hyperparameters ([0024] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner considers selecting numerous hyperparameters to identify their values is the same as determining a number of hyperparameters.)
adjusting one or more values of one or more hyperparameters in the group of hyperparameters to generate a respective version of the machine-learning model among the multiple versions of the machine-learning model ([0008-0009] 
Bilenko however, fails to explicitly teach determine descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined. Bilenko also fails to explicitly teach determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model. Bilenko also fails to explicitly teach training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model. Bilenko also fails to explicitly teach determine that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; and executing, by the processing device, the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value.
determine descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined ([Page 0322, Para. 5] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that Akaike teaches the calculation of ABIC for multiple versions of a model to compare them. The examiner also notes that Akaike’s use of “number of adjusted hyperparameters” means that there are at least two versions (or more, depending on the number of hyperparameter adjustments) of the same model, one version with unadjusted hyperparameters, and a second version (or more) with the adjusted hyperparameters. The examiner also notes that Bilenko and Akaike are both considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko’s learning model to incorporate determine descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined as taught by Akaike [Page 322, Para. 5] to take advantage of the Bayesian approach over the conventional statistics [Page 321, Para. 2]).
determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model. ([Page 0322, Para. 5] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that Akaike teaches the use of the number of adjusted hyperparameters and maximum marginal likelihood of multiple versions of a model to create their respective ABIC to compare them with each hyperparameter adjustment causing a respective ABIC value to be calculated. The examiner also notes that Akaike’s use of “number of adjusted hyperparameters” means that there are at least two versions (or more, depending on the number of hyperparameter adjustments) of the same model, one version with unadjusted hyperparameters, and a second version (or more) with the adjusted hyperparameters. The examiner also notes that Bilenko and Akaike are both considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko’s learning model to incorporate determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model as taught by Akaike [Page 0322, Para. 5] to 
Furthermore, Xu teaches training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model ([0042] The models generated at block 204 , Gskin and Gnon-skin using Eqn. 1 may represent the practical color distributions of image pixels in a skin dominant region (e.g., a facial region, a hand region, or the like) and a non–skin dominant region (e.g., a background region or the like). Such models may be used to determine a representative discriminative skin likelihood function P(i) as illustrated in FIG.2. The likelihood function P(i) may be provided to block 206 where pixel-wise skin detection is performed with real confidence. The examiner notes that Xu teaches in [0054] the use of multiple skin models each of which will require the determination of multiple likelihood functions. The examiner also notes that Xu teaches in [0065] the use of online training to obtain the likelihood function. The examiner also notes that Bilenko/Akaike and Xu are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike)’s learning model to incorporate training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model as taught by Xu [0054] to enhance the robustness and efficiency of image detection technologies [0001]).
determine that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; ([Page 356, Para. 4] This procedure is called the minimum AIC procedure and the model with the minimum AIC is called the minimum AIC estimate (MAICE) and is chosen to be the best model. The examiner notes that Bilenko/Akaike/Xu and Bozdogan are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu)’s learning model to incorporate determine that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; as taught by Bozdogan [Page 356, Para. 4] to select the best model with the least complexity, or equivalently, the highest information gain [[Page 356, Para. 4]).
Furthermore, Bozdogan teaches and execute the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value; ([Page 352, Para. 2] Following Akaike (1973), the problem of statistical model identification can be formulated as the problem of selecting a model f (x I Ok) based on n observations. The examiner notes that Bozdogan teaches the selection of the best statistical models (as shown above) to perform statistical data analysis. The examiner also notes that Bilenko/Akaike/Xu and Bozdogan are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have and execute the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value; as taught by Bozdogan [Page 352, Para. 2] to select a best fit statistical model to perform statistical data analysis).

Regarding claim 9, Bilenko/Akaike/Xu/Bozdogan teaches The non-transitory computer-readable medium of claim 8, further comprising program code that is executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model by: multiplying a first constant value by the number of hyperparameters to determine a first value; multiplying a second constant value by a logarithm of a maximum value of the respective likelihood function for the respective version of the machine-learning model to determine a  second value. determining the respective descriptor value for the respective version of the machine-learning model by adding the second value to the first value ([Page 0322, Para. 5 on Akaike] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that Akaike teaches two constants, one is a (-2) multiplied by the maximum marginal likelihood function and the other is a (2) multiplied by the number of adjusted hyper parameters. The examiner notes that Akaike teaches the calculation of 
Regarding claim 10 Bilenko/Akaike/Xu/Bozdogan teaches The non-transitory computer-readable medium of claim 8, wherein the one or more hyperparameters include every hyperparameter in the group of hyperparameters. ([0024 on Bilenko] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner notes that Bilenko teaches numerous hyperparameters of the learning algorithm which could include every hyperparameter).

Regarding claim 11 Bilenko/Akaike/Xu/Bozdogan teaches The non-transitory computer-readable medium of claim 8, wherein the one or more hyperparameters include two or more hyperparameters. ([0024 on Bilenko] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner notes that Bilenko 

Regarding claim 12 Bilenko/Akaike/Xu/Bozdogan teaches The non-transitory computer-readable medium of claim 8, wherein at least one hyperparameter in the group of hyperparameters does not affect a topology of the machine-learning model ([0002] The learning algorithm itself has parameters, which are referred to herein as hyper-parameters. Exemplary hyper-parameters can include a learning rate of the learning algorithm, a regularization coefficient of the learning algorithm, preprocessing options, structural properties of a predictive model that is to be learned ( e.g., a maximum number of leaves in a regression tree), etc.).

Regarding claim 13 Bilenko/Akaike/Xu/Bozdogan teaches The non-transitory computer-readable medium of claim 8, wherein the machine-learning model is a first type of machine-learning model prior to executing the particular version of the machine-learning model to perform the task: determine another descriptor value for a second type of machine-learning model that is different from the first type of machine-learning model; [(Page 353, Para. 3 on Bozdogan] Proposition I: Akaike's information criterion (AIC): Let {Mk: k = 1, 2, ... , K} be a set of competing models indexed by k = 1, 2, ... , K. Then the criterion AIC(k) = - 2 log L(θk) + 2k, which is minimized to choose a model Mk over the set of models is a natural sample estimator of twice the negentropy, 2E[J(θ*; θk)], or minus twice the expected log likelihood, - 2E[log f (X I θk)], of the true distribution with respect to a model with the parameters 
determine that the lowest descriptor value associated with the first type of machine-learning model is lower than the other descriptor value associated with the second type of machine-learning model, ([Page 356, Para. 4 on Bozdogan] This procedure is called the minimum AIC procedure and the model with the minimum AIC is called the minimum AIC estimate (MAICE) and is chosen to be the best model. The examiner notes that Bozdogan [Page 353, Para. 3] teaches a procedure called the minimum AIC procedure that is used to select the one model among multiple models with the lowest AIC as being the best model).
based on determining that the lowest descriptor value is lower than the other descriptor value, select the particular version of the machine-learning model for performing the task. ([page 357, Para. 6 on Bozdogan] Without violating Akaike's principles, using the established results in mathematical statistics, we improve and extend AIC analytically in two ways. These extensions make AIC asymptotically consistent, and that we penalize overparameterization more stringently to pick the simplest of the true models whenever there is nothing to be lost in doing so).

Regarding claim 15, Bilenko teaches A system comprising: a processing device; and a memory device on which instructions that are executable by the processing device are stored for causing the processing device to: Identify a group of hyperparameters for configuring a machine-learning model. ([0024] For instance, the candidate generator component 111 can utilize the Nelder Meade 
determine a number of hyperparameters in the group of hyperparameters ([0024] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner considers selecting numerous hyperparameters to identify their values is the same as determining a number of hyperparameters.)
adjusting one or more values of one or more hyperparameters in the group of hyperparameters to generate a respective version of the machine-learning model among the multiple versions of the machine-learning model ([0008-0009] For each candidate hyper-parameter configuration, the learning algorithm learns a respective predictive model based upon a set of training data, such that parameters of a predictive model are optimized based upon a respective hyper-parameter configuration of the learning algorithm and the set of training data. The examiner notes that Bilenko teaches the optimization of parameters based upon the configuration of hyperparameters).
determine descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined. Bilenko also fails to explicitly teach determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model. Bilenko also fails to explicitly teach training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model. Bilenko also fails to explicitly teach determine that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; and executing, by the processing device, the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value.
On the other hand, Akaike teaches determine descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined ([Page 0322, Para. 5] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have determine descriptor values for multiple versions of the machine-learning model using the number of hyperparameters and likelihood functions corresponding to the multiple versions of the machine-learning model, each respective descriptor value among the descriptor values being determined as taught by Akaike [Page 0322, Para. 5] to take advantage of the Bayesian approach over the conventional statistics [Page 321, Para. 2]).
Furthermore, Akaike teaches determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model. ([Page 0322, Para. 5] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that determining the respective descriptor value for the respective version of the machine-learning model using the number of hyperparameters in the group of hyperparameters and the respective likelihood function for the respective version of the machine-learning model as taught by Akaike [Page 0322, Para. 5] to take advantage of the Bayesian approach over the conventional statistics [Page 321, Para. 2]).
Furthermore, Xu teaches training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model ([0042] The models generated at block 204 , Gskin and Gnon-skin using Eqn. 1 may represent the practical color distributions of image pixels in a skin dominant region (e.g., a facial region, a hand region, or the like) and a non–skin dominant region (e.g., a background training the respective version of the machine-learning model to determine a respective likelihood function among the likelihood functions for the respective version of the machine-learning model as taught by Xu [0054] to enhance the robustness and efficiency of image detection technologies [0001]).
Furthermore, Bozdogan teaches determine that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; ([Page 356, Para. 4] This procedure is called the minimum AIC procedure and the model with the minimum AIC is called the minimum AIC estimate (MAICE) and is chosen to be the best model. The examiner notes that Bilenko/Akaike/Xu and Bozdogan are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu)’s learning determine that a particular version of the machine-learning model has a lowest descriptor value among the descriptor values by comparing the descriptor values to one another; as taught by Bozdogan [Page 356, Para. 4] to select the best model with the least complexity, or equivalently, the highest information gain [[Page 356, Para. 4]).
Furthermore, Bozdogan teaches and execute the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value; ([Page 352, Para. 2] Following Akaike (1973), the problem of statistical model identification can be formulated as the problem of selecting a model f (x I Ok) based on n observations. The examiner notes that Bozdogan teaches the selection of the best statistical models (as shown above) to perform statistical data analysis. The examiner also notes that Bilenko/Akaike/Xu and Bozdogan are considered to be analogous because they are in the same field of computational modeling. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu)’s learning model to incorporate and execute the particular version of the machine-learning model to perform a task in a computing environment based on the particular version of the machine-learning model having the lowest descriptor value; as taught by Bozdogan [Page 352, Para. 2] to select a best fit statistical model to perform statistical data analysis).

Regarding claim 16, Bilenko/Akaike/Xu/Bozdogan teaches The system of claim 15, wherein the memory device further comprises instructions that are executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model by: multiplying a first constant value by the number of hyperparameters to determine a first value; multiplying a second constant value by a logarithm of a maximum value of the respective likelihood function for the respective version of the machine-learning model to determine a  second value. determining the respective descriptor value for the respective version of the machine-learning model by adding the second value to the first value ([Page 0322, Para. 5 on Akaike] The general definition of ABIC of a model with hyperparameters determined by the method of type II maximum likelihood would have been ABIC = (-2) log (maximum marginal likelihood) + 2 (number of adjusted hyperparameters). The examiner notes that Akaike teaches two constants, one is a (-2) multiplied by the maximum marginal likelihood function and the other is a (2) multiplied by the number of adjusted hyper parameters. The examiner notes that Akaike teaches the calculation of ABIC for multiple versions of a model to compare them. The examiner also notes that Akaike’s use of “number of adjusted hyperparameters” means that there are at least two versions (or more, depending on the number of hyperparameter adjustments) of the same model, one version with unadjusted hyperparameters, and a second version (or more) with the adjusted hyperparameters).

Regarding claim 17 Bilenko/Akaike/Xu/Bozdogan teaches The system of claim 15, wherein the one or more hyperparameters include every hyperparameter in the group of hyperparameters. ([0024 on Bilenko] For instance, the candidate 

Regarding claim 18 Bilenko/Akaike/Xu/Bozdogan teaches The system of claim 15, wherein the one or more hyperparameters include two or more hyperparameters. ([0024 on Bilenko] For instance, the candidate generator component 111 can utilize the Nelder Meade algorithm to identify candidate hyper-parameter values. Thus, a candidate hyper-parameter configuration can be a value for a hyper-parameter or values for numerous respective hyper-parameters of the learning algorithm of the learner component 108. The examiner notes that Bilenko teaches numerous hyperparameters of the learning algorithm which means two or more hyperparameter).

Regarding claim 19 Bilenko/Akaike/Xu/Bozdogan teaches The system of claim 15, wherein the machine-learning model is a first type of machine-learning model, and wherein the memory device further comprises instructions that are executable by the processing device for causing the processing device to, prior to executing the particular version of the machine-learning model to perform the task: determine another descriptor value for a second type of machine-learning model that is different from the first type of machine-learning model; [(Page 353, Para. 3 on Bozdogan] Proposition I: Akaike's information criterion (AIC): Let {Mk: k = 1, 2, ... , K} be a set of competing models indexed by k = 1, 2, ... , K. Then the criterion AIC(k) = - 2 log L(θk) + 2k, which is minimized to choose a model Mk over the set of models is a natural sample estimator of twice the negentropy, 2E[J(θ*; θk)], or minus twice the expected log likelihood, - 2E[log f (X I θk)], of the true distribution with respect to a model with the parameters determined by the method of maximum likelihood. The examiner notes that Bozdogan teaches the calculation of AIC for multiple models).
determine that the lowest descriptor value associated with the first type of machine-learning model is lower than the other descriptor value associated with the second type of machine-learning model, ([Page 356, Para. 4 on Bozdogan] This procedure is called the minimum AIC procedure and the model with the minimum AIC is called the minimum AIC estimate (MAICE) and is chosen to be the best model. The examiner notes that Bozdogan [Page 353, Para. 3] teaches a procedure called the minimum AIC procedure that is used to select the one model among multiple models with the lowest AIC as being the best model).
based on determining that the lowest descriptor value is lower than the other descriptor value, select the particular version of the machine-learning model for performing the task. ([page 357, Para. 6 on Bozdogan] Without violating Akaike's principles, using the established results in mathematical statistics, we improve and extend AIC analytically in two ways. These extensions make AIC asymptotically consistent, and that we penalize overparameterization more stringently to pick the simplest of the true models whenever there is nothing to be lost in doing so).

Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bilenko (US 2014/0344193 Al), in view of Akaike (Likelihood and the Bayes procedure – 1998), further in view of Xu (US 2019 / 0065892 A1), and further in view of Bozdogan (MODEL SELECTION AND AKAIKES INFORMATION CRITERION (AIC) – 1987), further in view of Wikipedia (Multi-objective optimization  – 12/06/2017).

Regarding claim 7, Bilenko/Akaike/Xu/Bozdogan teaches The method of claim 1, however, it fails to explicitly teach wherein determining the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point.
On the other hand, Wikipedia teaches wherein determining the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point. ([Page11, Para2]: In the case of bi-objective problems, informing the decision maker concerning the Pareto front is usually carried out by its visualization (i.e. generating a Pareto surface in a graph): the Pareto front, often named the tradeoff having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis). The decision maker takes this information into account while specifying the preferred Pareto optimal objective point (i.e. determine the respective descriptor value based on the plot point). The examiner notes that Bilenko/Akaike/Xu/Bozdogan and Wikipedia are considered to be analogous because they are in the same field of data analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu/Bozdogan)’s learning model to incorporate wherein determining the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point as taught by Wikipedia [Page1, Para1] to make an optimal decision in the presence of trade-offs between two conflicting objectives [Page.1, Para. 1]).

Regarding claim 14, Bilenko/Akaike/Xu/Bozdogan teaches The non-transitory computer-readable medium of claim 8, however, it fails to explicitly teach further comprising program code that is executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point.
On the other hand, Wikipedia teaches further comprising program code that is executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point. ([Page11, Para2]: In the case of bi-objective problems, informing the decision maker concerning the Pareto front is usually carried out by its visualization (i.e. generating a Pareto surface in a graph): the Pareto front, often named the tradeoff curve in this case, can be drawn at the objective plane. The tradeoff curve gives full information on objective values and on objective tradeoffs, which inform how improving one objective is related to deteriorating the second one while moving along the tradeoff curve (i.e. having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis). The decision maker takes this determine the respective descriptor value based on the plot point). The examiner notes that Bilenko/Akaike/Xu/Bozdogan and Wikipedia are considered to be analogous because they are in the same field of data analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified (Bilenko/Akaike/Xu/Bozdogan)’s learning model to incorporate further comprising program code that is executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point as taught by Wikipedia [Page1, Para1] to make an optimal decision in the presence of trade-offs between two conflicting objectives [Page.1, Para. 1]).

Regarding claim 20, Bilenko/Akaike/Xu/Bozdogan teaches The system of claim 15, however, it fails to explicitly teach wherein the memory device further comprises instructions that are executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point.
On the other hand, Wikipedia teaches wherein the memory device further comprises instructions that are executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point. ([Page11, Para2]: In the case of bi-objective problems, informing the decision maker concerning the Pareto front is usually carried out by its visualization (i.e. generating a Pareto surface in a graph): the Pareto front, often named the tradeoff curve in this case, can be drawn at the objective plane. The tradeoff curve gives full information on objective values and on objective tradeoffs, which inform how improving one objective is related to deteriorating the second one while moving along the tradeoff curve (i.e. having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis). The decision maker takes this information into account while specifying the preferred Pareto optimal objective point (i.e. determine the respective descriptor value based on the plot point). The examiner notes that Bilenko/Akaike/Xu/Bozdogan and Wikipedia are considered to be analogous because they are in the same field of data analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the wherein the memory device further comprises instructions that are executable by the processing device for causing the processing device to determine the respective descriptor value for the respective version of the machine-learning model comprises: generating a Pareto surface in a graph having (i) model accuracy along a first axis and (ii) the number of hyperparameters used to configure the machine-learning model along a second axis; determining a plot point on the Pareto surface in the graph; and determine the respective descriptor value based on the plot point as taught by Wikipedia [Page1, Para1] to make an optimal decision in the presence of trade-offs between two conflicting objectives [Page.1, Para. 1]).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Elena - Going from Sum of Squared Errors to the Maximum Likelihood – 2016
Hoffmann (US 2017 /0261949 Al)
Panchal - Searching Most Efficient Neural Network Architecture Using Akaike’s Information Criterion (AIC) – 2010

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAMCY ALGHAZZY whose telephone number is (571)272-8824. The examiner can normally be reached Monday-Friday 7:30am-4:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

/SHAMCY ALGHAZZY/Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128