DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 10/17/2018 and the Remarks and Amendments filed on 11/10/2020.  

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the 
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, 

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding independent claims 1, 12, and 20, they all disclose the limitation “automatically generate clusters of the models, the clusters comprising a plurality of the models having at least one of a similar input type or a similar output type”.  This limitation is unclear.  How can one generate clusters of models where the models themselves have a similar input or output type?  It seems that the data that the models receive as input and output can be clustered according to a similarity, but not the input or output type of the models.   Paragraph [0030] of the originally filed specification discloses “Model clusterer 130 may include one or more computing systems configured to collect models applied to the same or similar (e.g., same data type, same type of having at least one of a similar input type or a similar output type.  Please explain what this means.  For examination purposes, this limitation in independent claims 1, 12, and 20 will be interpreted to mean that models are clustered or grouped together based on the models being in any way similar.  Appropriate correction is required.  Dependent claims 2-11 and 13-19 are rejected under 35 U.S.C. 112(b) as being indefinite by virtue of their dependency on indefinite independent claims 1, 12, and 20.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


1, 2, and 5-18 are rejected under 35 U.S.C. § 103 as being obvious over Hinton et al. (Hinton et al., “Distilling the Knowledge in a Neural Network”, Mar. 9, 2015, NIPS 2014 Deep Learning Workshop, pp. 1-9, hereinafter “Hinton”) in view of Andoni et al, (US 20200210847 A1, hereinafter “Andoni”).

Regarding claim 1, Hinton discloses [a] system for generating a singular ensemble model, comprising: (Abstract; “We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse”, suggesting the system for generating a singular ensemble model)
a model generator configured to: obtain information from at least one of a database, a client device, or an input recognizer, and generate a plurality of machine learning models based on the obtained information, the models having a plurality of layers, a plurality of nodes, and a model type; (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer, the processor being, under a broadest reasonable interpretation of the claim language read in view of paragraph [0025] of the originally filed specification and as interpreted under 35 USC § 112(f), is a model generator or general purpose computer; and Abstract; "an ensemble of models", suggesting generating a plurality of machine learning models; and Page 5, “We trained 10 separate models to predict P”, suggesting obtaining and generating a plurality of machine learning models based on obtained information; and Page 7, §5.5; “we trained 61 specialist models, each with 300 classes (plus the dustbin class)”, further suggesting obtaining from a database and generating the models; and Page 5, §5.1; “JFT is an internal Google dataset that has 100 million labeled images with 15,000 labels. When we did this work, Google’s baseline model for JFT was a deep convolutional neural network [7] that had been trained for about six months using asynchronous stochastic gradient descent on a large number of cores”, suggesting obtaining from a database; and Page 3, §3; the section discloses that the models have a plurality of layers (hidden layers), nodes, and a model type (neural net))
at least one processor; and at least one storage medium storing instructions that, when executed, configure the processor to perform operations comprising: (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer)
obtaining, from the model clusterer, a plurality of the clustered machine learning models (Abstract; "an ensemble of models", suggesting obtaining a plurality of machine learning models)
obtaining a training data set consistent with the clustered models from at least one of the database, client device, or the input recognizer (Page 4, Section 4, Paragraph 1)
applying the clustered models to the training data set to obtain outputs associated with the clustered models (Page 4, Section 4; the ensemble of models are applied to the training set to obtain outputs)
mapping the outputs to features of the clustered models (Page 4, Section 4; "our acoustic model P which maps acoustic observations at time t", suggesting mapping outputs to features of the models (acoustic models)), combining the mapped features of the models into a singular machine learning model)
combining the mapped features into a singular machine learning model (Abstract; "we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single distilled model; and Page 4, Section 4; "distilling an ensemble of models into a single model that works significantly better than a model of the same size that is learned directly from the same training data”)
training the singular machine learning model using the training data set (Page 4, Section 4; “distilling an ensemble of models into a single model that works significantly better than a model of the same size that is learned directly from the same training data", suggesting training the singular machine learning model (distilled model))
and outputting the trained singular machine learning model (Page 5, Table 1; the table shows an output of the analysis).
Hinton fails to explicitly disclose a model clusterer configured to: obtain the machine learning models from the model generator, and cluster models by input or output type according to same or similar machine learning models; the clustered machine learning models.
	Andoni discloses a model clusterer configured to: obtain the models from the model generator; and automatically generate clusters of the models, the clusters comprising a plurality of the models having at least one of a similar input type or a similar output type (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models”, which discloses, under a broadest reasonable  interpretation of the claim language in view of the specification at [0030] and in view of the 112(b) indefiniteness rejection above, obtaining models form the model generator and automatically generating clusters or groups of the models based on the models being similar; and [0100]; “The method 1000 may include grouping the models of the plurality of models into species based on genetic distance between models”; and [0095]; “determining, by a processor of a computing device, a subset of model”, which discloses obtaining the models from the model generator to then cluster or group them by similarity)
the clustered machine learning models (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models).
Hinton and Andoni are analogous art because both are concerned with ensemble learning for machine learning applications.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in ensemble learning to combine the model clustering of Andoni with the system of Hinton to yield the predictable result of a model clusterer configured to: obtain the models from the model generator; and automatically generate clusters of the models, the clusters comprising a plurality of the models having at least one of a similar input type or a similar output type. The 

Regarding claim 12, Hinton discloses [a] system for generating a singular ensemble model, comprising: (Abstract; “We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse”, suggesting the system for generating a singular ensemble model)
a model generator configured to: obtain information from at least one of a database, a client device, or an input recognizer; and generate a plurality of machine learning models based on the obtained information (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer, the processor being, under a broadest reasonable interpretation of the claim language read in view of paragraph [0025] of the originally filed specification and as interpreted under 35 USC § 112(f), is a model generator or general purpose computer; and Abstract; "an ensemble of models", suggesting generating a plurality of machine learning models; and Page 5, §4.1; “We trained 10 separate models to predict P”, suggesting obtaining and generating a plurality of machine learning models based on obtained information; and Page 7, §5.5; “we trained 61 specialist models, each with 300 classes (plus the dustbin class)”, further suggesting obtaining from a database and generating the models; and Page 5, §5.1; “JFT is an internal Google dataset that has 100 million labeled images with 15,000 labels. When we did this work, Google’s baseline model for JFT was a deep convolutional neural network [7] that had been trained for about six months using asynchronous stochastic gradient descent on a large number of cores”, suggesting obtaining from a database)
and consistent with a requested prediction included in the received information, the models having a plurality of layers, a plurality of nodes, and a model type, wherein the requested prediction predicts input of a user based on features extracted from partial inputs (Page 4, §4; “In this section, we investigate the effects of ensembling Deep Neural Network (DNN) acoustic models that are used in Automatic Speech Recognition (ASR) . . . The input is 26 frames of 40 Mel-scaled filterbank coefficients with a 10ms advance per frame and we predict the HMM state of 21st frame”, suggesting a requested prediction the predicts input of a user (from ASR signals) based on extracted features (26 frames); ; and Page 3, §3; the section discloses that the models have a plurality of layers (hidden layers), nodes, and a model type (neural net))
at least one processor; and at least one storage medium storing instructions that, when executed, configure the processor to perform operations comprising: (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer)
obtaining, from the model clusterer, a plurality of the clustered machine learning models (Abstract; "an ensemble of models", suggesting obtaining a plurality of machine learning models)
obtaining a plurality of training data sets consistent with the clustered models from at least one of the database, the device, or the input recognizer, each set corresponding to one of the models (Page 4, Section 4, Paragraph 1)
applying the clustered models to the corresponding training data sets to obtain output sets associated with the models, each output set corresponding to one of the clustered models (Page 4, Section 4; the ensemble of models are applied to the training set to obtain outputs)
combining the output sets to form a final output set (Page 4, Section 4)
mapping the final output set to features of the clustered models (Page 4, Section 4; "our acoustic model P which maps acoustic observations at time t", suggesting mapping outputs to features of the models (acoustic models)), combining the mapped features of the models into a singular machine learning model)
combining the mapped features of the models into a singular machine learning model (Abstract; "we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single distilled model; and Page 4, Section 4; "distilling an ensemble of models into a single model that works significantly better than a model of the same size that is learned directly from the same training data”)
training the singular machine learning model using the training data set (Page 4, Section 4; “distilling an ensemble of models into a single model that works significantly better than a model of the same size that is learned directly from the same training data", suggesting training the singular machine learning model (distilled model))
and outputting the trained singular machine learning model (Page 5, Table 1; the table shows an output of the analysis).
Hinton fails to explicitly disclose a model clusterer configured to: obtain the machine learning models from the model generator, and cluster models by input or output type according to same or similar machine learning models; the clustered machine learning models.
	Andoni discloses a model clusterer configured to: obtain the models from the model generator; and automatically generate clusters of the models, the clusters comprising a plurality of the models having at least one of a similar input type or a similar output type (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models”, which discloses, under a broadest reasonable  interpretation of the claim language in view of the specification at [0030] and in view of the 112(b) indefiniteness rejection above, obtaining models form the model generator and automatically generating clusters or groups of the models based on the models being similar; and [0100]; “The method 1000 may include grouping the models of the plurality of models into species based on genetic distance between models”; and [0095]; “determining, by a processor of a computing device, a subset of model”, which discloses obtaining the models from the model generator to then cluster or group them by similarity)
the clustered machine learning models (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models).
The motivation to combine Hinton and Andoni is the same as discussed above with respect to claim 1.

Regarding claim 2, the rejection of claim 1 is incorporated and Hinton further discloses wherein the models comprise at least one neural network (Page 3, Section 3).

Regarding claim 5, the rejection of claim 1 is incorporated and Hinton further discloses wherein the singular machine learning model comprises a neural network that overfits the clustered models (Page 5, Section 5, and Page 7, Section 6; "training the baseline model with hard targets leads to severe overfitting").
Hinton fails to explicitly disclose clustered machine learning models.
Andoni discloses clustered machine learning models (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models).
The motivation to combine Hinton and Andoni is the same as discussed above with respect to claim 1.

Regarding claim 6, the rejection of claims 1 and 5 are incorporated and Hinton further discloses wherein the singular machine learning model comprises a plurality of layers, and a number of layers is at least as great as a number of layers in a first one of the models having a largest number of layers (Page 4, Paragraph 2; "the distilled net had 300 or more units in each of its two hidden layers"; and Page 3, Last Paragraph going into Page 4, First Paragraph; the large and small Neural nets both have two hidden layers, which is the same number of layers as the distilled (singular) net).

Regarding claim 7, the rejection of claims 1, 5, and 6 are incorporated and Hinton further discloses wherein each layer of the singular machine learning model comprises a plurality of nodes, the number of machine learning model nodes being at least as great as the number of nodes in corresponding layers in one of the models having a largest number of nodes (Page 3, Section 3; "rectified linear hidden units"; and Page 3, Last Paragraph going into Page 4, First Paragraph; the distilled unit (singular model) may have as many units (nodes) as the corresponding layer of a model in the plurality of models that has a largest number of nodes; and Page 8, Section 8; "we have shown that nearly all of the improvement that is achieved by training an ensemble of deep neural nets can be distilled into a single neural net of the same size which is far easier to deploy").

Regarding claims 8 and 13, the rejection of claims 1 and 12 incorporated and Hinton further discloses wherein mapping the outputs to features of the models comprises applying one or more weights to the outputs during mapping (Page 6, Section 5.2;  the section discloses applying one or more weights during mapping).

Regarding claim 9 and 14, the rejection of claims 1, 8, 12, and 13 are incorporated and Hinton further discloses wherein the one or more weights are equal to each other (Page 3, Section 3; "Dropout can be viewed as a way of training an exponentially large ensemble of models that share weights"; and Page 6, Section 5.2; "each specialist model is initialized with the weights of the generalist model").

Regarding claim 10 and 15, the rejection of claims 1, 8, 12, and 13 are incorporated and Hinton further discloses wherein the one or more weights comprise inputs from a user (Page 5, Section 4.1; "we tried temperatures of [1,2, 5, 10] and used a relative weight of 0.5 on the cross-entropy for the hard targets").

Regarding claim 11, the rejection of claim 1 is incorporated and Hinton further discloses wherein outputting the trained singular machine learning model comprises at least one of storing the trained singular machine learning model in the at least one storage medium or transmitting the trained singular machine learning model to a user device (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer and storing the models in a storage medium).


Regarding claim 16, the rejection of claim 12 is incorporated and Hinton further discloses wherein the mapped features of the models comprise feature vectors extracted from the training data sets (Page 4, Section 4; "State-of-the-art ASR systems currently use DNNs to map a (short) temporal context of features derived from the waveform to a probability distribution over the discrete states of a Hidden Markov Model").

Regarding claim 17, the rejection of claim 12 is incorporated and Hinton further discloses wherein the operations further comprise training the singular machine learning model using at least one new training data set (Page 5, Section 5.1; "processing different mini-batches from the training").

Regarding claim 18, the rejection of claim 12 is incorporated and Hinton further discloses wherein training the singular machine learning model comprises recursive adjustments of one or more parameters of the singular machine learning model (Page 5, Section 5.1; "Each replica computes the average gradient on its current mini-batch and sends this gradient to a sharded parameter server which sends back new values for the parameters").

Claims 3 and 4 are rejected under 35 U.S.C. § 103 as being obvious over Hinton in view of Andoni and Helmy (Helmy et al., "Non-linear Heterogeneous Ensemble Model for Permeability Prediction of Oil Reservoirs", Mar. 2, 2013, Arab J Sci Eng., pp. 1379-1395, hereinafter “Helmy”).

3, the rejection of claim 1 is incorporated but Hinton fails to explicitly disclose wherein the models comprise at least one linear regression.
Helmy discloses wherein the models comprise at least one linear regression (Page 1381, Column 2; "Fundamentally, in support vector regression (SVR) the data is mapped into a high-dimensional feature space (F) via a non-linear mapping () and then the linear regression is done in this space", suggesting the linear regression model).
Hinton, Andoni, and Helmy are analogous art because all are concerned with machine learning applications.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the linear regression of Helmy with the system of Hinton and Andoni to yield the predictable result of wherein the plurality of machine learning models includes at least one linear regression. The motivation for doing so would be to increase the generalization capability an ensemble of models to provide for accurate predictions (Helmy; Abstract).

Regarding claim 4, the rejection of claim 1 is incorporated but Hinton fails to explicitly disclose wherein the operations further comprise: determining whether the models comprise a same model type; when the models are determined to comprise the same model type, selecting the same model type for the singular machine learning model; and when the models are determined to comprise different model types, selecting a neural network type for the singular machine learning model.
wherein the operations further comprise: determining whether the models comprise a same model type (Page 8, Discussion; only neural networks are used in the ensemble and the distilled model)
when the models are determined to comprise the same model type, selecting the same model type for the singular machine learning model; and when the models are determined to comprise different model types, selecting a neural network type for the singular machine learning model (Page 8, Discussion; a single neural network is produced or distilled from an ensemble of neural networks; and Abstract; "In this paper, an ensemble of SVM, ANN and ANFIS is proposed to predict the permeability of oil reservoirs by using real-life well logs. An ANN model is used to implement a non-linear ensemble strategy").
The motivation to combine Hinton, Andoni, and Helmy is the same as discussed above with respect to claim 3.

Claim 19 is rejected under 35 U.S.C. § 103 as being obvious over Hinton in view of Andoni and Zhao et al. (Zhao et al., "Design of ensemble neural network using the Akaike information criterion", Dec. 2008, Engineering Applications of Artificial Intelligence 21, pp. 1182-1188, hereinafter “Zhao”).

Regarding claim 19, the rejection of claims 12 and 18 are incorporated but Hinton fails to explicitly disclose wherein the recursive adjustments are configured to reduce at least one of root-mean-square deviation (RMSD), Akaike information criterion (AIC), or logarithmic loss (LOGLOSS).
wherein the recursive adjustments are configured to reduce at least one of root-mean-square deviation (RMSD), Akaike information criterion (AIC), or logarithmic loss (LOGLOSS) (Abstract; "In this paper, an ensemble neural network algorithm is proposed based on the Akaike information criterion”, suggesting the AIC).
Hinton, Andoni, and Zhao are analogous art because all are concerned with machine learning applications.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the Akaike information criterion of Zhao with the system of Hinton and Andoni to yield the predictable result of wherein the recursive adjustments are configured to reduce at least one of root-mean-square deviation (RMSD), Akaike information criterion (AIC), or logarithmic loss (LOGLOSS). The motivation for doing so would be to find the best combination weights of the ensemble neural network (Zhao; Abstract).

Claim 20 is rejected under 35 U.S.C. § 103 as being obvious over Hinton in view of Andoni and in further view of Nguyen et al. (Nguyen et al., "Stopping criteria for ensemble of evolutionary artificial neural networks", Nov. 2005, Applied Soft Computing, Volume 6, Issue 1, pp. 100-107, hereinafter “Nguyen”) and Lekivetz et al. (US 20200117580 A1, hereinafter “Lekivetz”).

Regarding claim 20, Hinton discloses [a] system for generating a singular ensemble model, comprising: (Abstract; “We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse”, suggesting the system for generating a singular ensemble model)
a model generator configured to: obtain information from the input recognizer, and generate a plurality of machine learning models based on the obtained information, the models having a plurality of layers, a plurality of nodes, and a model type; (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer, the processor being, under a broadest reasonable interpretation of the claim language read in view of paragraph [0025] of the originally filed specification and as interpreted under 35 USC § 112(f), is a model generator or general purpose computer; and Abstract; "an ensemble of models", suggesting generating a plurality of machine learning models; and Page 5, §4.1; “We trained 10 separate models to predict P”, suggesting obtaining and generating a plurality of machine learning models based on obtained information; and Page 7, §5.5; “we trained 61 specialist models, each with 300 classes (plus the dustbin class)”, further suggesting obtaining from a database and generating the models; and Page 5, §5.1; “JFT is an internal Google dataset that has 100 million labeled images with 15,000 labels. When we did this work, Google’s baseline model for JFT was a deep convolutional neural network [7] that had been trained for about six months using asynchronous stochastic gradient descent on a large number of cores”, suggesting obtaining from a database; and Page 3, §3; the section discloses that the models have a plurality of layers (hidden layers), nodes, and a model type (neural net))
at least one processor; and at least one storage medium storing instructions that, when executed, configure the processor to perform operations comprising: (Page 4, Section 4; it is clear that Hinton performs their method on a computer. A processor and a storage medium are inherent in performing the method on a computer)
obtaining, from the model clusterer, a plurality of the clustered models (Abstract; "an ensemble of models", suggesting obtaining a plurality of machine learning models)
obtaining,  from the input recognizer, the training data set, the training data set being consistent with the clustered models f (Page 4, Section 4, Paragraph 1)
applying the plurality of clustered models to the training data set to obtain outputs associated with the models (Page 4, Section 4; the ensemble of models are applied to the training set to obtain outputs)
mapping the outputs to features of the models (Page 4, Section 4; "our acoustic model P which maps acoustic observations at time t", suggesting mapping outputs to features of the models (acoustic models)), combining the mapped features of the models into a singular machine learning model)
combining the mapped features of the models into a singular machine learning model (Abstract; "we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single distilled model; and Page 4, Section 4; "distilling an ensemble of models into a single model that works significantly better than a model of the same size that is learned directly from the same training data”)
applying the singular machine learning model to the training data set to obtain output (Page 5, Table 1)
comparing the outputs to the output of the singular machine learning model (Page 5, Table 1)
adjusting one or more parameters of the singular machine learning model based on the comparison (Page 5, Section 5.1; "Each replica computes the average gradient on its current parameter server which sends back new values for the parameters”, suggesting the adjusting of one or more parameters based on the comparison).
Hinton fails to explicitly disclose an input recognizer configured to: receive a request for a model from a client device; provide, based on the request, an indication to a model generator to generate a model; and cluster data to create a training data set; a model clusterer configured to: obtain the machine learning models from the model generator; and automatically generate clusters of the models, the clusters comprising a plurality of the models having at least one of a similar input type and similar output type; the clustered machine learning models; determining whether the comparing and the adjusting comply with at least one threshold, wherein the one or more threshold comprises at least one of a direct threshold or a threshold relative to last iteration of a parameter adjustment; and outputting the trained singular machine learning model to the client device the client device being configured to: store a data structure defining parameters of the trained singular machine learning model; retrieve input data for the trained singular machine learning model; apply the trained singular machine learning model to the input data to generate results; and display the results.
	Andoni discloses a model clusterer configured to: obtain the models from the model generator; and automatically generate clusters of the models, the clusters comprising a plurality of the models having at least one of a similar input type or a similar output type (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models”, which discloses, under a broadest reasonable  interpretation of the claim language in view of the specification at [0030] and in view of the 112(b) indefiniteness rejection above, obtaining models form the model generator and automatically generating clusters or groups of the models based on the models being similar; and [0100]; “The method 1000 may include grouping the models of the plurality of models into species based on genetic distance between models”; and [0095]; “determining, by a processor of a computing device, a subset of model”, which discloses obtaining the models from the model generator to then cluster or group them by similarity)
the clustered machine learning models (Claim 17; “The method of claim 11, further comprising : grouping models of a plurality of models into species based on similarities between the models).
The motivation to combine Hinton and Andoni is the same as discussed above with respect to claim 1.
Nguyen discloses determining whether the comparing and the adjusting comply with at least one threshold, wherein the one or more threshold comprises at least one of a direct threshold or a threshold relative to last iteration of a parameter adjustment (Abstract; “In this paper, we show that different early stopping criteria based on (i) the minimum validation fitness of the ensemble, and (ii) the minimum of the average population validation fitness could generalize better than the survival population in the last generation”, suggesting the comparing an adjusting based on a threshold or stopping criterion; and Page 104, §2.6.2; the section discloses comparing and adjusting based on a threshold for a winner take all machine learning approach for ensemble learning).
Hinton, Andoni, and Nguyen are analogous art because all are concerned with machine learning applications.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the comparing and adjusting against a threshold of Nguyen with the system of Hinton and Andoni to yield the predictable result of determining whether the comparing and the adjusting comply with a one or more threshold, wherein the one or more threshold comprise direct thresholds or thresholds relative to last iteration. The motivation for doing so would be to enhance generalization so that neural networks will not overfit training data (Nguyen; Conclusion)
Lekivetz discloses an input recognizer configured to: receive a request for a model from a client device; ([0004]; “The computing device receives a request requesting an evaluation of the data for generating a model to predict responses based on the plurality of factors”, which discloses receiving the request from a client or computing device; and 
provide, based on the request, an indication to a model generator to generate a model; and ([0004]; “The computing device receives a request requesting an evaluation of the data for generating a model to predict responses based on the plurality of factors “, which discloses the indication to generate a model)
cluster data to create a training data set ([0167]; “For example, the raw form of the training data can be smoothed, truncated, aggregated, clustered, or otherwise manipulated into another form, which can then be used for training the machine-learning model”)
outputting the trained singular machine learning model to the client device ([0165]; Figure 11; Figure 13; [0179])
the client device being configured to: store a data structure defining parameters of the trained singular machine learning model; ([0167]; the training data is retrieved from a database and it is thus stored, and the training data is the data structure defining parameters of the trained singular machine learning model under a broadest reasonable interpretation of the claim language; and [0179]; “The machine-learning model(s) can be implemented using a single computing device or multiple computing devices, such as the communications grid computing system 400 discussed above)
retrieve input data for the trained singular machine learning model; ([0167]; and Figure 11, 1104)
apply the trained singular machine learning model to the input data to generate results; and (Figure 11, 1112)
display the results ([0172]; “provide a result”; and [0179]; “The machine-learning model(s) can be implemented using a single computing device or multiple computing devices, such as the communications grid computing system 400 discussed above; and Figure 13, 1306).

Response to Arguments

Applicant’s arguments, filed on 11/10/2020, with respect to the 35 USC § 103 rejection of claims 1-20 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection to reject independent claims 1, 12, and 20.  Hinton and Andoni are now being used to render claims 1 and 12 obvious under 35 USC § 103, and Hinton, Andoni, Nguyen, and Lekivetz are now being used to render claim 20 obvious under 35 USC § 103.
 
Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403.  The examiner can normally be reached on Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
 
/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125