Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/504,353 filed on 07/08/2019. Claims 1-20, as originally filed, are currently pending and have been considered below. Claim 1, 10 and 19 are independent claims.

Information Disclosure Statement
The information disclosure statement (IDS) are submitted on 07/08/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference characters "104" and "106" have both been used to designate training set.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: reference number 100 in FIG. 1 and FIG. 2 is not mentioned in the specification.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a system, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a system for generating a set of Deep Learning (DL) models. Each of the following limitation(s):   
generate a set of estimated performance functions for each of the DL models in the initial set based on the set of edge-related metrics; 
generate a plurality of objective functions based on the set of estimated performance functions; 
generate a final DL model set based on the objective functions;  

as drafted, claim 1 is a machine that, under its broadest reasonable interpretation, covers mental processes corresponding to an evaluation, judgement, or a combination of.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “a storage device”, and “a processor to”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the limitations of “to store a training corpus comprising training data, a parameters vector, and a set of edge-related metrics” and “receive a user selection of a selected DL model from the final DL model set”, which can be considered as mere data gathering. See MPEP 2106.05(g). Further, the limitation of “deploy the selected DL model to an edge device”, as drafted, is reciting insignificant extra solution activity because it relates to transmitting information for further process. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere receiving and transmitting data under MPEP 2106.05(d). Further, the additional element of “train an initial set of DL models using the training data, wherein a topology of each of the DL models is determined based on the parameters vector” as drafted, is reciting insignificant extra-solution activity because it is nominally or tangentially related to the claimed invention and does not integrate the abstract idea into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “to store a training corpus comprising training data, a parameters vector, and a set of edge-related metrics” and “deploy the selected DL model to an edge device” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II). Further, the insignificant extra-solution activity of “receive a user selection of a selected DL model from the final DL model set” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Finally, the insignificant extra-solution activity of “train an initial set of DL models using the training data, wherein a topology of each of the DL models is determined based on the parameters vector” is well-understood, routine, and conventional. MPEP 2106.05(d) notes that “[a] factual determination is required to support conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F/3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018). Munro et al. (US 20160162456 A1), Para. [0163], “a conventional model training process” teaches a model training process as being conventional. The claim is not patentable. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 2,
Claim 2, dependent upon Claim 1, recites “evaluate performance of the DL models in the final DL model set to determine whether a performance predicted by the objective functions is in agreement with an actual performance of the DL models in the final DL model set” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements (other than the processor previously addressed). 
Regarding claim 3,
Claim 3, dependent upon Claim 2, recites “adjust a topology of one of the models in the final DL model set upon a determination that the performance of the DL model predicted by the objective functions differs from the actual performance of the DL model by a threshold error criterion” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements (other than the processor previously addressed). 
Regarding claim 4,
Claim 4, dependent upon Claim 1, recites “generate a user interface that enables a user to specify an objective and displays a ranked list of top ranked DL models ranked in accordance with the specified objective” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements. The recitation of “wherein to receive the user selection comprises to” can be considered as mere data gathering. See MPEP 2106.05(g).
Regarding claim 5,
Claim 5, dependent upon Claim 4, recites “wherein each of the DL models make predictions based on common DL model input, with a final prediction to be determined based on a voting scheme” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements (other than the processor previously addressed). The recitation of “deploy a plurality of DL models to the edge device” is reciting insignificant extra solution activity because it relates to transmitting information for further process. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere receiving and transmitting data under MPEP 2106.05(d).
Regarding claim 6,
Claim 6, dependent upon Claim 1, recites “wherein to generate the final DL model set based on the objective functions comprises to” is a mental process of corresponding to an evaluation, judgement, or a combination of and “compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions” is a mathematical concept corresponding to mathematical calculation. The claim does not recite no new additional elements.
Regarding claim 7,
Claim 7, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “wherein the edge-related metrics comprise an inference time, a model size, and a test accuracy”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h).
Regarding claim 8,
Claim 8, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “wherein the parameters vector comprises values describing a number of layers and a number of nodes per layer for each model in the initial set of DL models”. This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f).
Regarding claim 9,
Claim 9, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements "wherein the selected DL model is a classifier". This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer or other machinery as a tool to perform an abstract idea. See MPEP 2106.05(f).
Regarding claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for generating a set of Deep Learning (DL) models. Each of the following limitation(s):   
generating a set of estimate performance functions for each of the DL models in the initial set based on a set of edge-related metrics; 
generating a plurality of objective functions based on the set of estimated performance functions; 
generating a final DL model set based on the objective functions

as drafted, claim 10 is a process that, under its broadest reasonable interpretation, covers mental processes corresponding to an evaluation, judgement, or a combination of.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). Further, the limitation of “receiving a user selection of a selected DL model from the final DL model set”, which can be considered as mere data gathering. See MPEP 2106.05(g). Further, the limitation of “deploying the selected DL model to an edge device”, as drafted, is reciting insignificant extra solution activity because it relates to transmitting information for further process. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere receiving and transmitting data under MPEP 2106.05(d). Further, the additional element of “training an initial set of Deep Learning (DL) models on training data, wherein a topology of each of the DL models is determined based on a parameters vector” as drafted, is reciting insignificant extra-solution activity because it is nominally or tangentially related to the claimed invention and does not integrate the abstract idea into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “deploying the selected DL model to an edge device” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II). Further, the insignificant extra-solution activity of “receiving a user selection of a selected DL model from the final DL model set” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Finally, the insignificant extra-solution activity of “training an initial set of Deep Learning (DL) models on training data, wherein a topology of each of the DL models is determined based on a parameters vector” is well-understood, routine, and conventional. MPEP 2106.05(d) notes that “[a] factual determination is required to support conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F/3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018). Munro et al. (US 20160162456 A1), Para. [0163], “a conventional model training process” teaches a model training process as being conventional. The claim is not patentable. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 11,
Claim 11, dependent upon Claim 10, recites “evaluating performance of the DL models in the final DL model set to determine whether a performance predicted by the objective functions is in agreement with an actual performance of the DL models in the final DL model set” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements. 
Regarding claim 12,
Claim 12, dependent upon Claim 11, recites “adjusting a topology of one of the models in the final DL model set upon a determination that the performance of the DL model predicted by the objective functions differs from the actual performance of the DL model by a threshold error criterion” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements. 
Regarding claim 13,
Claim 13, dependent upon Claim 10, recites “generating a user interface that enables a user to specify an objective and displaying, at the user interface, a ranked list of top ranked DL models ranked in accordance with the specified objective” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements. The recitation of “wherein receiving the user selection comprises” can be considered as mere data gathering. See MPEP 2106.05(g).
Regarding claim 14,
Claim 14, dependent upon Claim 10, recites “wherein each of the DL models make predictions based on common DL model input, with a final prediction to be determined based on a voting scheme” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements. The recitation of “deploying a plurality of DL models to the edge device” is reciting insignificant extra solution activity because it relates to transmitting information for further process. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere receiving and transmitting data under MPEP 2106.05(d).
Regarding claim 15,
Claim 15, dependent upon Claim 10, recites “wherein generating the final DL model set based on the objective functions comprises to” is a mental process of corresponding to an evaluation, judgement, or a combination of and “compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions” is a mathematical concept corresponding to mathematical calculation. The claim does not recite no new additional elements.
Regarding claim 16,
Claim 16, dependent upon Claim 10, does not recite any additional abstract ideas, and only recites additional elements “wherein the edge-related metrics comprise an inference time, a model size, and a test accuracy”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h).
Regarding claim 17,
Claim 17, dependent upon Claim 10, does not recite any additional abstract ideas, and only recites additional elements “wherein the parameters vector comprises values describing a number of layers and a number of nodes per layer for each model in the initial set of DL models”. This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f).
Regarding claim 18,
Claim 18, dependent upon Claim 10, does not recite any additional abstract ideas, and only recites additional elements "wherein the selected DL model is a classifier". This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer or other machinery as a tool to perform an abstract idea. See MPEP 2106.05(f).
Regarding claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to a computer program product, which is directed to a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a computer program product for generating a set of Deep Learning (DL) models. Each of the following limitation(s):   
generate a set of estimated performance functions for each of the DL models in the initial set based on a set of edge-related metrics comprising an inference time, a model size, and a test accuracy; 
generate a plurality of objective functions based on the set of estimated performance functions; 
generate a final DL model set based on the objective functions

as drafted, claim 19 is a manufacture that, under its broadest reasonable interpretation, covers mental processes corresponding to an evaluation, judgement, or a combination of.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and wherein the program instructions are executable by a processor to cause the processor to”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the limitation of “receive a user selection of a selected DL model from the final DL model set”, which can be considered as mere data gathering. See MPEP 2106.05(g). Further, the limitation of “deploy the selected DL model to an edge device”, as drafted, is reciting insignificant extra solution activity because it relates to transmitting information for further process. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere receiving and transmitting data under MPEP 2106.05(d). Further, the additional element of “train an initial set of DL models using training data, wherein a topology of each of the DL models is determined based on a parameters vector that specifies a number of layers and a number of nodes per layer for each model in the initial set of DL models” as drafted, is reciting insignificant extra-solution activity because it is nominally or tangentially related to the claimed invention and does not integrate the abstract idea into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “deploy the selected DL model to an edge device” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II). Further, the insignificant extra-solution activity of “receive a user selection of a selected DL model from the final DL model set” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Finally, the insignificant extra-solution activity of “train an initial set of DL models using training data, wherein a topology of each of the DL models is determined based on a parameters vector that specifies a number of layers and a number of nodes per layer for each model in the initial set of DL models” is well-understood, routine, and conventional. MPEP 2106.05(d) notes that “[a] factual determination is required to support conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F/3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018). Munro et al. (US 20160162456 A1), Para. [0163], “a conventional model training process” teaches a model training process as being conventional. The claim is not patentable. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 20,
Claim 20, dependent upon Claim 19, recites “generate a user interface that enables a user to specify an objective and displays a ranked list of top ranked DL models ranked in accordance with the specified objective” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements. The recitation of “wherein to receive the user selection comprises to” can be considered as mere data gathering. See MPEP 2106.05(g).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rajarathinam et al. (US20200202179A1) in view of Gallardo (US20170270435A1)
Regarding Claim 1,
Rajarathinam et al. teaches a system for generating a set of Deep Learning (DL) models, comprising (Rajarathinam et al., Para. [0005], “In some embodiments, the logic circuitry may provide a set of two or more models” teaches a system that provides a set of models. Para. [0020], “Many embodiments create or identify a set of two or more models” teaches generating a set of models. Para. [0022], “The model methodology relates to the model type implemented such as a linear model, a non-linear model, and a deep learning model” teaches the set of models being a deep learning model). 
a storage device to store a training corpus comprising training data, a parameters vector, and a set of edge-related metrics, a processor to (Rajarathinam et al., Para. [0082], “database or other data storage; may associate portions of the transaction data with a training dataset 2152” teaches a data storage (corresponds to a storage device) that stores training dataset. Para. [0007], “the input features comprise a portion of or all the multiple data types in the training dataset” teaches feature vectors (corresponds to parameters vector) being part of the training data, thus in the storage. Para. [0049], “the processor(s) 1110 may execute instructions such as instructions” teaches the processor performing inference computation (corresponds to edge-related metrics) based on transaction data. Para. [0035], “A tree-type non-linear model may have hyperparameters such as the number of leaves or depth of a tree; the number of trees; the subsample rate; the quorum sample or number of features per tree” teaches a model size (corresponds to edge-related metrics). Para. [0028], “The assessed accuracy of the model may relate to the magnitudes of the residuals, the number of input features identified as contributors to the residual for a model, the degrees of freedom associated with a model, the chi-squared distribution associated with a model, a combination of one or more of these factors” teaches the model accuracy (corresponds to the edge-related metrics)) .
train an initial set of DL models using the training data, wherein a topology of each of the DL models is determined based on the parameters vector (Rajarathinam et al., Para. [0005], “the logic circuitry may provide a set of two or more models, each model trained based on a training dataset and validated based on a testing dataset” teaches training a set of models (corresponds to the initial set of DL models) using training dataset. Para. [0007], “input features to input at an input layer of each model as a tensor, the input features comprise a portion of or all the multiple data types in the training dataset” teaches as a tensor denotes topology (required inputs) determined by features/parameter vector). 
generate a set of estimated performance functions for each of the DL models in the initial set based on the set of edge-related metrics (Rajarathinam et al., Para. [0071], “the residual modelers 2042 through 2048 may output an indicator for each of the models 2020 through 2028 to indicate the overall performance of each of the models 2020 through 2028 such as the degrees of freedom and the chi-squared distribution” teaches determining the overall performance (corresponds to estimated performance functions based on the edge-related metrics) of each of the models. Para. [0052], “The key feature report may identify and/or explain critical variables (the input features) related to model underperformance (the residuals)” teaches critical variables (corresponds to the set) related to the model residuals (corresponds to the estimated performance function for each of the DL models)).
generate a plurality of objective functions based on the set of estimated performance functions (Rajarathinam et al., Para. [0081], “In some embodiments, the model identifier 2130 may randomly or pseudo randomly identify a model based on input features of the model, based on input features identified by a user, and/or based on input features associated with a testing schedule, and/or input features associated with models previously tested” teaches a plurality of objective function to select the input features to us in the final DL model).
generate a final DL model set based on the objective functions (Rajarathinam et al., FIG. 2A and Para. [0068], “the residual modelers 2040 through 2048 may receive residuals output by the models 2020 through 2028, respectively, from objective function logic circuitry such as the objective function logic circuitry 1550” teaches generating a plurality of final models based on the objection function logic circuitry (corresponds to the objective functions)).
Rajarathinam et al. does not appear to explicitly teach receive a user selection of a selected DL model from the final DL model set; and deploy the selected DL model to an edge device
However, Gallardo, teaches receive a user selection of a selected DL model from the final DL model set (Gallardo, Para. [0007], “perform data cleansing on a set of user-specified fields, select a set of default metrics for use in comparing performance of a plurality of fraud detection models, select a set of operators to be applied to the data, format the data for each selected operator, execute the selected operators, and determine a best model from the plurality of models based on the execution of the selected operators” teaches selecting the best model (corresponds to a selected DL model) from the plurality of model (corresponds to the final DL model set) based on the user-specified field (corresponds to the received user selection)).
deploy the selected DL model to an edge device (Gallardo, Para. [0104], “The best-suited medical claim predictive model is applied to unseen data to get fraud prediction results” teaches deploying the best-suited medical claim predictive model (corresponds to the selected DL model)).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to receive a user selection of a selected DL model from the final DL model set; and deploy the selected DL model to an edge device, as taught by Gallardo, to the system for generating a set of Deep Learning (DL) models of Rajarathinam et al. The motivation of computing metrics based on domain knowledge to save computation and iteration time in deep learning and can make the results more easily interpretable (Gallardo, Para. [0127], “compute metrics based on domain knowledge (deep learning models can determine useful metrics through analysis of the data, but it is still beneficial if the application can provide a base set of known metrics, as pre-computing metrics can save computation and iteration time in deep learning and can make the results more easily interpretable)”).
Regarding Claim 2,
The Rajarathinam et al. and Gallardo combination of claim 1 teaches the system of claim 1, 
The combination, as described in the rejection of claim 1, further teaches wherein the processor is to evaluate performance of the DL models in the final DL model set to determine whether a performance predicted by the objective functions is in agreement with an actual performance of the DL models in the final DL model set (Rajarathinam et al., Para. [0025], “After training and validating each of the models in the set of models, embodiments may test the set of models during a monitoring period with a monitoring period dataset” teaches testing the set of models, after training, with a monitoring period dataset).
Regarding Claim 3,
The Rajarathinam et al. and Gallardo combination of claim 2 teaches the system of claim 2, 
The combination, as described in the rejection of claim 2, further teaches wherein the processor is to adjust a topology of one of the models in the final DL model set upon a determination that the performance of the DL model predicted by the objective functions differs from the actual performance of the DL model by a threshold error criterion (Rajarathinam et al., Para. [0068], “The residual modelers 2040 through 2048 may receive data for input features to track the input data received at the input of each of the models 2020 through 2028. With the input data, the residual modelers 2040 through 2048 may correlate the input data related to the input features of each model with the residual from the model to detect a correlation, if any. In some embodiments, the residual modelers 2040 through 2048 may receive residuals output by the models 2020 through 2028, respectively, from objective function logic circuitry such as the objective function logic circuitry 1550 shown in FIG. 1C. In further embodiments, the residual modelers 2040 through 2048 may receive probabilities or predicted results output by the models 2020 through 2028, respectively, and determine residuals for each of the models 2020 through 2028” teaches training of the residual modelers with input features from objective function logic circuitry to determine predicted results (corresponds to performance) of the DL models. Para. [0070], “the residual modelers 2040 through 2048 may determine the list of input features by selecting input features that correlate with the residual of each model with a correlation value that meets or exceeds a correlation threshold” teaches a threshold error criterion. Para. [0027], “Residual modeling may use an input feature vector or tensor of a model and analyze the residuals with respect to each feature in the model over the monitoring period to determine a list of features that contribute to a residual of each model… such embodiments may generate a combined list of features from the set of models” teaches the residual modelers determining features (corresponds to affecting the topology). Para. [0028], “The highest ranked feature, for example, may be the feature that contributed to the residual or error in the results output by the most models in the set of models” teaches the features effecting the outputs).
Regarding Claim 4,
The Rajarathinam et al. and Gallardo combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein to receive the user selection comprises to generate a user interface that enables a user to specify an objective and displays a ranked list of top ranked DL models ranked in accordance with the specified objective (Gallardo, Para. [0007], “perform data cleansing on a set of user-specified fields” teaches selecting the best model  from the plurality of model based on the user-specified field (corresponds to the received user selection). Para. [0102], “FIG. 4, at 3, Absolute Insight obtains the list of all artifacts (saved and/or shared models, risk scores—rankings, charts and dashboards) from a Database. At 4, Absolute Insight also gets a list of cached processes, data and lists them for the user” teaches displaying the ranked list of top ranked DL models. Para. [0306], “using the modeling screen, the user connects a data source with the Analyzer Operator via the graphical user interface” teaches generating a user interface that allows user to specify objective to the Analyzer Operator. Para. [0314], “the metrics for each of the optimal models produced by each of the algorithms are then generated, compared and ranked to choose the best model from the multiple models automatically produced by the Analyzer Operator” teaches the multiple models ranked in accordance to the metrics for each of the optimal models produced (corresponds to the specified objective)).
Regarding Claim 5,
The Rajarathinam et al. and Gallardo combination of claim 4 teaches the system of claim 4,
The combination, as described in the rejection of claim 4, further teaches wherein the processor is to deploy a plurality of DL models to the edge device (Rajarathinam et al., Para. [0025], “After training and validating each of the models in the set of models, embodiments may test the set of models during a monitoring period with a monitoring period dataset” teaches testing the set of models, after training and deployment of the plurality if models, with a monitoring period dataset).
wherein each of the DL models make predictions based on common DL model input, with a final prediction to be determined based on a voting scheme (Gallardo, Para. [0004], “a healthcare fraud detection system comprises… a data input providing healthcare data, the data input being user selectable from at least one data source, the data input being coupled to the core processing system” teaches the common DL model input. Para. [0104], “ If the analysis involves machine learning, such as Deep learning, then the Deep Learning Engine is called at 7, which accesses the distributed in-memory cache and executes the learning process to build medical claim fraud predictive models. The best-suited medical claim predictive model is applied to unseen data to get fraud prediction results” teaches each predictive models making prediction with fraud prediction results (corresponds to final prediction) from the final model (based on the best-suited predictive model) is based on a voting scheme). 
Regarding Claim 7,
The Rajarathinam et al. and Gallardo combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the edge-related metrics comprise an inference time, a model size, and a test accuracy (Rajarathinam et al., Para. [0035], “For instance, the hyperparameters may include the number of latent factors in a matrix factorization” teaches an inference time (corresponds to edge-related metrics). Para. [0035], “A tree-type non-linear model may have hyperparameters such as the number of leaves or depth of a tree; the number of trees; the subsample rate; the quorum sample or number of features per tree” teaches a model size (corresponds to edge-related metrics). Para. [0028], “The assessed accuracy of the model may relate to the magnitudes of the residuals, the number of input features identified as contributors to the residual for a model, the degrees of freedom associated with a model, the chi-squared distribution associated with a model, a combination of one or more of these factors” teaches the model accuracy (corresponds to the edge-related metrics)) .
Regarding Claim 8,
The Rajarathinam et al. and Gallardo combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the parameters vector comprises values describing a number of layers and a number of nodes per layer for each model in the initial set of DL models (Rajarathinam et al., Para. [0035], “a deep learning model may have hyperparameters such as the number of hidden layers in a deep neural network; the number of neurons per layer; the number of epochs performed for training; the batch size; and/or the like” teaches the hyperparameter (corresponds to the parameters vector) comprising of the number of hidden layer and the number of neurons per layer).
Regarding Claim 9,
The Rajarathinam et al. and Gallardo combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the selected DL model is a classifier (Rajarathinam et al., Para. [0018], “Transaction data can be any type of data that is collected over time and includes trends that a model can learn through training for the purposes of determining a prediction or classification” teaches the deep learning model being a classifier). 
Regarding Claim 10,
Rajarathinam et al. teaches a method of generating a set of Deep Learning (DL) models, the method comprising (Rajarathinam et al., Para. [0005], “In some embodiments, the logic circuitry may provide a set of two or more model” teaches a method that provides a set of models. Para. [0020], “Many embodiments create or identify a set of two or more models” teaches generating a set of models. Para. [0022], “The model methodology relates to the model type implemented such as a linear model, a non-linear model, and a deep learning model” teaches the set of models being a deep learning model).
training an initial set of Deep Learning (DL) models on training data, wherein a topology of each of the DL models is determined based on a parameters vector (Rajarathinam et al., Para. [0005], “the logic circuitry may provide a set of two or more models, each model trained based on a training dataset and validated based on a testing dataset” teaches training a set of models (corresponds to the initial set of DL models) using training dataset. Para. [0007], “the input features comprise a portion of or all the multiple data types in the training dataset” teaches as a tensor denotes topology (required inputs) determined by features/parameter vector). 
generating a set of estimate performance functions for each of the DL models in the initial set based on a set of edge-related metrics (Rajarathinam et al., Para. [0071], “the residual modelers 2042 through 2048 may output an indicator for each of the models 2020 through 2028 to indicate the overall performance of each of the models 2020 through 2028 such as the degrees of freedom and the chi-squared distribution” teaches determining the overall performance (corresponds to estimated performance functions based on the edge-related metrics) of each of the models. Para. [0052], “The key feature report may identify and/or explain critical variables (the input features) related to model underperformance (the residuals)” teaches critical variables (corresponds to the set) related to the model residuals (corresponds to the estimated performance function for each of the DL models)).
generating a plurality of objective functions based on the set of estimated performance functions (Rajarathinam et al., Para. [0081], “In some embodiments, the model identifier 2130 may randomly or pseudo randomly identify a model based on input features of the model, based on input features identified by a user, and/or based on input features associated with a testing schedule, and/or input features associated with models previously tested” teaches a plurality of objective function to select the input features to us in the final DL model).
generating a final DL model set based on the objective functions (Rajarathinam et al., FIG. 2A and Para. [0068], “the residual modelers 2040 through 2048 may receive residuals output by the models 2020 through 2028, respectively, from objective function logic circuitry such as the objective function logic circuitry 1550” teaches generating a plurality of final models based on the objection function logic circuitry (corresponds to the objective functions)).
Rajarathinam et al. does not appear to explicitly teach receiving a user selection of a selected DL model from the final DL model set; and deploying the selected DL model to an edge device 
However, Gallardo, teaches receiving a user selection of a selected DL model from the final DL model set (Gallardo, Para. [0007], “perform data cleansing on a set of user-specified fields, select a set of default metrics for use in comparing performance of a plurality of fraud detection models, select a set of operators to be applied to the data, format the data for each selected operator, execute the selected operators, and determine a best model from the plurality of models based on the execution of the selected operators” teaches selecting the best model (corresponds to a selected DL model) from the plurality of model (corresponds to the final DL model set) based on the user-specified field (corresponds to the received user selection)).
deploying the selected DL model to an edge device (Gallardo, Para. [0104], “The best-suited medical claim predictive model is applied to unseen data to get fraud prediction results” teaches deploying the best-suited medical claim predictive model (corresponds to the selected DL model)).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to receive a user selection of a selected DL model from the final DL model set; and deploy the selected DL model to an edge device, as taught by Gallardo, to the system for generating a set of Deep Learning (DL) models of Rajarathinam et al. The motivation of computing metrics based on domain knowledge to save computation and iteration time in deep learning and can make the results more easily interpretable (Gallardo, Para. [0127], “compute metrics based on domain knowledge (deep learning models can determine useful metrics through analysis of the data, but it is still beneficial if the application can provide a base set of known metrics, as pre-computing metrics can save computation and iteration time in deep learning and can make the results more easily interpretable)”).
Regarding Claim 11,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the method of claim 10, 
The combination, as described in the rejection of claim 10, further teaches comprising evaluating performance of the DL models in the final DL model set to determine whether a performance predicted by the objective functions is in agreement with an actual performance of the DL models in the final DL model set (Rajarathinam et al., Para. [0025], “After training and validating each of the models in the set of models, embodiments may test the set of models during a monitoring period with a monitoring period dataset” teaches testing the set of models, after training, with a monitoring period dataset).
Regarding Claim 12,
The Rajarathinam et al. and Gallardo combination of claim 11 teaches the system of claim 11, 
The combination, as described in the rejection of claim 11, further teaches comprising adjusting a topology of one of the models in the final DL model set upon a determination that the performance of the DL model predicted by the objective functions differs from the actual performance of the DL model by a threshold error criterion (Rajarathinam et al., Para. [0068], “The residual modelers 2040 through 2048 may receive data for input features to track the input data received at the input of each of the models 2020 through 2028. With the input data, the residual modelers 2040 through 2048 may correlate the input data related to the input features of each model with the residual from the model to detect a correlation, if any. In some embodiments, the residual modelers 2040 through 2048 may receive residuals output by the models 2020 through 2028, respectively, from objective function logic circuitry such as the objective function logic circuitry 1550 shown in FIG. 1C. In further embodiments, the residual modelers 2040 through 2048 may receive probabilities or predicted results output by the models 2020 through 2028, respectively, and determine residuals for each of the models 2020 through 2028” teaches training of the residual modelers with input features from objective function logic circuitry to determine predicted results (corresponds to performance) of the DL models. Para. [0070], “the residual modelers 2040 through 2048 may determine the list of input features by selecting input features that correlate with the residual of each model with a correlation value that meets or exceeds a correlation threshold” teaches a threshold error criterion. Para. [0027], “Residual modeling may use an input feature vector or tensor of a model and analyze the residuals with respect to each feature in the model over the monitoring period to determine a list of features that contribute to a residual of each model… such embodiments may generate a combined list of features from the set of models” teaches the residual modelers determining features (corresponds to affecting the topology). Para. [0028], “The highest ranked feature, for example, may be the feature that contributed to the residual or error in the results output by the most models in the set of models” teaches the features effecting the outputs).
Regarding Claim 13,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the system of claim 10,
The combination, as described in the rejection of claim 10, further teaches wherein receiving the user selection comprises generating a user interface that enables a user to specify an objective and displaying, at the user interface, a ranked list of top ranked DL models ranked in accordance with the specified objective (Gallardo, Para. [0007], “perform data cleansing on a set of user-specified fields” teaches selecting the best model  from the plurality of model based on the user-specified field (corresponds to the received user selection). Para. [0102], “FIG. 4, at 3, Absolute Insight obtains the list of all artifacts (saved and/or shared models, risk scores—rankings, charts and dashboards) from a Database. At 4, Absolute Insight also gets a list of cached processes, data and lists them for the user” teaches displaying the ranked list of top ranked DL models. Para. [0306], “using the modeling screen, the user connects a data source with the Analyzer Operator via the graphical user interface” teaches generating a user interface that allows user to specify objective to the Analyzer Operator. Para. [0314], “the metrics for each of the optimal models produced by each of the algorithms are then generated, compared and ranked to choose the best model from the multiple models automatically produced by the Analyzer Operator” teaches the multiple models ranked in accordance to the metrics for each of the optimal models produced (corresponds to the specified objective)).
Regarding Claim 14,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the system of claim 10,
The combination, as described in the rejection of claim 10, further teaches comprising deploying a plurality of DL models to the edge device (Rajarathinam et al., Para. [0025], “After training and validating each of the models in the set of models, embodiments may test the set of models during a monitoring period with a monitoring period dataset” teaches testing the set of models, after training and deployment of the plurality if models, with a monitoring period dataset).
wherein each of the DL models make predictions based on common DL model input, with a final prediction to be determined based on a voting scheme (Gallardo, Para. [0004], “a healthcare fraud detection system comprises… a data input providing healthcare data, the data input being user selectable from at least one data source, the data input being coupled to the core processing system” teaches the common DL model input. Para. [0104], “ If the analysis involves machine learning, such as Deep learning, then the Deep Learning Engine is called at 7, which accesses the distributed in-memory cache and executes the learning process to build medical claim fraud predictive models. The best-suited medical claim predictive model is applied to unseen data to get fraud prediction results” teaches each predictive models making prediction with fraud prediction results (corresponds to final prediction) from the final model (based on the best-suited predictive model) is based on a voting scheme). 
Regarding Claim 16,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the system of claim 10,
The combination, as described in the rejection of claim 10, further teaches wherein the edge-related metrics comprise an inference time, a model size, and a test accuracy (Rajarathinam et al., Para. [0035], “For instance, the hyperparameters may include the number of latent factors in a matrix factorization” teaches an inference time (corresponds to edge-related metrics). Para. [0035], “A tree-type non-linear model may have hyperparameters such as the number of leaves or depth of a tree; the number of trees; the subsample rate; the quorum sample or number of features per tree” teaches a model size (corresponds to edge-related metrics). Para. [0028], “The assessed accuracy of the model may relate to the magnitudes of the residuals, the number of input features identified as contributors to the residual for a model, the degrees of freedom associated with a model, the chi-squared distribution associated with a model, a combination of one or more of these factors” teaches the model accuracy (corresponds to the edge-related metrics)).
Regarding Claim 17,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the system of claim 10,
The combination, as described in the rejection of claim 10, further teaches wherein the parameters vector comprises values describing a number of layers and a number of nodes per layer for each model in the initial set of DL models (Rajarathinam et al., Para. [0035], “a deep learning model may have hyperparameters such as the number of hidden layers in a deep neural network; the number of neurons per layer; the number of epochs performed for training; the batch size; and/or the like” teaches the hyperparameter (corresponds to the parameters vector) comprising of the number of hidden layer and the number of neurons per layer).
Regarding Claim 18,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the system of claim 10,
The combination, as described in the rejection of claim 10, further teaches wherein the selected DL model is a classifier (Gallardo, Para. [0117], “Absolute Insight Deep learning may: perform dimension reduction, classifier” teaches the deep learning model being a classifier). 
Regarding Claim 19,
Rajarathinam et al. teaches a computer program product for generating a set of Deep Learning (DL) models comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and wherein the program instructions are executable by a processor to cause the processor to (Rajarathinam et al., Para. [0005], “In some embodiments, the logic circuitry may provide a set of two or more model” teaches a storage media that provides a set of models. Para. [0118], “Examples of software elements, which may reside in the storage medium 6020, may include… computer programs” teaches the computer program product. Para. [0020], “Many embodiments create or identify a set of two or more models” teaches generating a set of models. Para. [0022], “The model methodology relates to the model type implemented such as a linear model, a non-linear model, and a deep learning model” teaches the set of models being a deep learning model. Para. [0116], “storage medium 5000 may include any non-transitory computer readable medium or machine readable medium, such as an optical, magnetic or semiconductor storage. Storage medium 5000 may store various types of computer executable instructions, such as instructions to implement logic flows and/or techniques described herein” teaches a non-transitory computer readable medium having program instructions executable by a processor).
train an initial set of DL models using training data, wherein a topology of each of the DL models is determined based on a parameters vector that specifies a number of layers and a number of nodes per layer for each model in the initial set of DL models (Rajarathinam et al., Para. [0005], “the logic circuitry may provide a set of two or more models, each model trained based on a training dataset and validated based on a testing dataset” teaches training a set of models (corresponds to the initial set of DL models) using training dataset. Para. [0007], “the input features comprise a portion of or all the multiple data types in the training dataset” teaches as a tensor denotes topology (required inputs) determined by features/parameter vector).
generate a set of estimated performance functions for each of the DL models in the initial set based on a set of edge-related metrics comprising an inference time, a model size, and a test accuracy (Rajarathinam et al., Para. [0071], “the residual modelers 2042 through 2048 may output an indicator for each of the models 2020 through 2028 to indicate the overall performance of each of the models 2020 through 2028 such as the degrees of freedom and the chi-squared distribution” teaches determining the overall performance (corresponds to estimated performance functions based on the edge-related metrics) of each of the models. Para. [0052], “The key feature report may identify and/or explain critical variables (the input features) related to model underperformance (the residuals)” teaches critical variables (corresponds to the set) related to the model residuals (corresponds to the estimated performance function for each of the DL models). Para. [0035], “For instance, the hyperparameters may include the number of latent factors in a matrix factorization” teaches an inference time (corresponds to edge-related metrics). Para. [0035], “A tree-type non-linear model may have hyperparameters such as the number of leaves or depth of a tree; the number of trees; the subsample rate; the quorum sample or number of features per tree” teaches a model size (corresponds to edge-related metrics). Para. [0028], “The assessed accuracy of the model may relate to the magnitudes of the residuals, the number of input features identified as contributors to the residual for a model, the degrees of freedom associated with a model, the chi-squared distribution associated with a model, a combination of one or more of these factors” teaches the model accuracy (corresponds to the edge-related metrics)).
generate a plurality of objective functions based on the set of estimated performance functions (Rajarathinam et al., Para. [0081], “In some embodiments, the model identifier 2130 may randomly or pseudo randomly identify a model based on input features of the model, based on input features identified by a user, and/or based on input features associated with a testing schedule, and/or input features associated with models previously tested” teaches a plurality of objective function to select the input features to us in the final DL model).
generate a final DL model set based on the objective functions (Rajarathinam et al., FIG. 2A and Para. [0068], “the residual modelers 2040 through 2048 may receive residuals output by the models 2020 through 2028, respectively, from objective function logic circuitry such as the objective function logic circuitry 1550” teaches generating a plurality of final models based on the objection function logic circuitry (corresponds to the objective functions)).
Rajarathinam et al. does not appear to explicitly teach receive a user selection of a selected DL model from the final DL model set; and deploy the selected DL model to an edge device
However, Gallardo, teaches receive a user selection of a selected DL model from the final DL model set (Gallardo, Para. [0007], “perform data cleansing on a set of user-specified fields, select a set of default metrics for use in comparing performance of a plurality of fraud detection models, select a set of operators to be applied to the data, format the data for each selected operator, execute the selected operators, and determine a best model from the plurality of models based on the execution of the selected operators” teaches selecting the best model (corresponds to a selected DL model) from the plurality of model (corresponds to the final DL model set) based on the user-specified field (corresponds to the received user selection)).
deploy the selected DL model to an edge device (Gallardo, Para. [0104], “The best-suited medical claim predictive model is applied to unseen data to get fraud prediction results” teaches deploying the best-suited medical claim predictive model (corresponds to the selected DL model)).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to receive a user selection of a selected DL model from the final DL model set; and deploy the selected DL model to an edge device, as taught by Gallardo, to the system for generating a set of Deep Learning (DL) models of Rajarathinam et al. The motivation of computing metrics based on domain knowledge to save computation and iteration time in deep learning and can make the results more easily interpretable (Gallardo, Para. [0127], “compute metrics based on domain knowledge (deep learning models can determine useful metrics through analysis of the data, but it is still beneficial if the application can provide a base set of known metrics, as pre-computing metrics can save computation and iteration time in deep learning and can make the results more easily interpretable)”).
Regarding Claim 20,
The Rajarathinam et al. and Gallardo combination of claim 19 teaches the computer program product of claim 19,
The combination, as described in the rejection of claim 19, further teaches wherein to receive the user selection comprises to generate a user interface that enables a user to specify an objective and displays a ranked list of top ranked DL models ranked in accordance with the specified objective (Gallardo, Para. [0007], “perform data cleansing on a set of user-specified fields” teaches selecting the best model  from the plurality of model based on the user-specified field (corresponds to the received user selection). Para. [0102], “FIG. 4, at 3, Absolute Insight obtains the list of all artifacts (saved and/or shared models, risk scores—rankings, charts and dashboards) from a Database. At 4, Absolute Insight also gets a list of cached processes, data and lists them for the user” teaches displaying the ranked list of top ranked DL models. Para. [0306], “using the modeling screen, the user connects a data source with the Analyzer Operator via the graphical user interface” teaches generating a user interface that allows user to specify objective to the Analyzer Operator. Para. [0314], “the metrics for each of the optimal models produced by each of the algorithms are then generated, compared and ranked to choose the best model from the multiple models automatically produced by the Analyzer Operator” teaches the multiple models ranked in accordance to the metrics for each of the optimal models produced (corresponds to the specified objective)).
Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Rajarathinam et al. in view of Gallardo in further view of Shah et al. (“Pareto Frontier Learning with Expensive Correlated Objectives”)
Regarding Claim 6,
The Rajarathinam et al. and Gallardo combination of claim 1 teaches the system of claim 1,
Rajarathinam et al. and Gallardo does not appear to explicitly teach wherein to generate the final DL model set based on the objective functions comprises to compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions
However, Shah et al., teaches wherein to generate the final DL model set based on the objective functions comprises to compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions (Shah et al., Section 2.2 Pg. 3, “Analogous to the single objective case, we can formulate a multi objective Bayesian optimization problem as maximizing a future reward,                 
                    
                        
                            r
                        
                        
                            T
                        
                    
                    =
                    [
                    
                        
                            V
                            o
                            l
                        
                        
                            
                                
                                    v
                                
                                
                                    r
                                    e
                                    f
                                
                            
                        
                    
                    
                        
                            P
                            
                                
                                    
                                        
                                            Ý
                                        
                                        
                                            T
                                        
                                    
                                
                            
                        
                    
                    -
                    
                        
                            V
                            o
                            l
                        
                        
                            
                                
                                    v
                                
                                
                                    r
                                    e
                                    f
                                
                            
                        
                    
                    
                        
                            P
                            
                                
                                    
                                        
                                            Y
                                        
                                        
                                            *
                                        
                                    
                                
                            
                        
                    
                    ]
                
            , where Y * is the true Pareto frontier and ÝT is the suggested Pareto frontier after T evaluations of each of the objectives” teaches determining the Pareto frontier of the DL model compared to the computed objective functions of the DL performance).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to generate the final DL model set based on the objective functions comprises to compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions, as taught by Shah et al., to the system for generating a set of Deep Learning (DL) models of Rajarathinam et al. in view of Gallardo. The motivation of overcoming the problem of intractable integrals with consist outperformance of competing models (Shah et al., Section 5 Pg. 7, “In this paper, we argue that modelling correlations amongst objectives in multi-objective Pareto optimization problems is important for success. To overcome the problem of intractable integrals, we devise a novel approximation which leads to an analytic and differentiable approximation to the expected increase in Pareto hypervolume acquisition function. Two forms of correlated output GP models are implemented on a variety of multi-objective problems, and seem to consistently outperform competing models which model objectives as being independent”).
Regarding Claim 15,
The Rajarathinam et al. and Gallardo combination of claim 10 teaches the system of claim 10,
Rajarathinam et al. and Gallardo does not appear to explicitly teach wherein generating the final DL model set based on the objective functions comprises to compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions
However, Shah et al., teaches wherein generating the final DL model set based on the objective functions comprises to compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions ((Shah et al., Section 2.2 Pg. 3, “Analogous to the single objective case, we can formulate a multi objective Bayesian optimization problem as maximizing a future reward,                 
                    
                        
                            r
                        
                        
                            T
                        
                    
                    =
                    [
                    
                        
                            V
                            o
                            l
                        
                        
                            
                                
                                    v
                                
                                
                                    r
                                    e
                                    f
                                
                            
                        
                    
                    
                        
                            P
                            
                                
                                    
                                        
                                            Ý
                                        
                                        
                                            T
                                        
                                    
                                
                            
                        
                    
                    -
                    
                        
                            V
                            o
                            l
                        
                        
                            
                                
                                    v
                                
                                
                                    r
                                    e
                                    f
                                
                            
                        
                    
                    
                        
                            P
                            
                                
                                    
                                        
                                            Y
                                        
                                        
                                            *
                                        
                                    
                                
                            
                        
                    
                    ]
                
            , where Y * is the true Pareto frontier and ÝT is the suggested Pareto frontier after T evaluations of each of the objectives” teaches determining the Pareto frontier of the DL model compared to the computed objective functions of the DL performance).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to generate the final DL model set based on the objective functions comprises to compute a Pareto front of a plot of DL model parameters versus DL performance as computed by the objective functions, as taught by Shah et al., to the system for generating a set of Deep Learning (DL) models of Rajarathinam et al. in view of Gallardo. The motivation of overcoming the problem of intractable integrals with consist outperformance of competing models (Shah et al., Section 5 Pg. 7, “In this paper, we argue that modelling correlations amongst objectives in multi-objective Pareto optimization problems is important for success. To overcome the problem of intractable integrals, we devise a novel approximation which leads to an analytic and differentiable approximation to the expected increase in Pareto hypervolume acquisition function. Two forms of correlated output GP models are implemented on a variety of multi-objective problems, and seem to consistently outperform competing models which model objectives as being independent”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HENRY TRONG NGUYEN/Examiner, Art Unit 2125        

/BRIAN M SMITH/Primary Examiner, Art Unit 2122