DETAILED ACTION
This action is in response to the application filed 09/28/2018. Claims 1-20 are pending and have been considered. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
3. 	The information disclosure statements (IDS) submitted on 09/28/2018, 06/16/2020, and 07/23/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, 7-10, 14-16, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Aslan et al. ("US 20170132528 A1", hereinafter "Aslan") in view of Myers ("Efficient Amplification of the Security of Weak Pseudo-Random Function Generators", hereinafter "Myers").

Regarding claim 1, Aslan teaches A method comprising:
providing a plurality of data elements (“The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102. For example, the training data 104 can comprise a repository of images that are to be classified or labeled by the machine learning models 100 and/or 102. The training data 104 can further include at least two additional components: features and labels. However, the training data 104 may be unlabeled in some implementations, such that the machine learning models 100 and/or 102 can be trained using any suitable learning technique, such as supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and so on.” [¶0024; Examiner is interpreting training data to be equivalent to data elements.]) for training a plurality of machine learning models combined into a machine learning ensemble (“The machine learning models 100 and 102, and any of the machine learning models discussed herein, can be implemented as any type of machine learning model…. An “ensemble” can comprise a collection of models whose outputs (predictions) are combined, such as by using weighted averaging or voting. The individual machine learning models of an ensemble can differ in their expertise, and the ensemble can operate as a committee of individual machine learning models that is collectively “smarter” than any individual machine learning model of the ensemble.” [¶0022]); 
training the machine learning ensemble using the plurality of data elements to produce a trained machine learning ensemble (“FIG. 1 further illustrates that training data 104 can be used to train at least one of the machine learning models 100 and/or 102. FIG. 1 shows that both machine learning models 100 and 102 can receive at least some of the training data 104, but this is merely shown for exemplary purposes.” [¶0023, the machine learning models in the ensemble receiving training data would imply that a machine learning ensemble is produced.]); and 
choosing, one of the plurality of machine learning models to provide an output in response to receiving an input during inference operation of the machine learning ensemble (“For example, the first model 100 can be trained to infer a set of probabilities for a multi-label classification task based on unknown image data received as input, and the second model 100 can be trained to classify the unknown image data as one of multiple possible class labels, but does not infer a set of probabilities as output. The tasks are similar in that they relate to classifying unknown images by one of multiple class labels, but one model (the first model 100) outputs a set of probabilities as a prediction while the other model (the second model 102) outputs class labels. In general, the “task” can comprise a task to infer an expected output based at least in part on an unknown input.” [¶0025; Aslan discloses a first/second model providing an expected output in response to receiving an input, this would correspond to choosing a model to provide an output. An inference operation would correspond to a model receiving an input and producing an output in response to the input which is disclosed by Aslan.]).
However Aslan fails to explicitly teach pseudo-randomly choosing, using a piecewise function 
Myers teaches pseudo-randomly choosing, using a piecewise function (“
    PNG
    media_image1.png
    163
    476
    media_image1.png
    Greyscale
” [pg. 7, ¶2; Furthermore, Myers discloses “For any set A, let x ∈ A be the action of uniformly at random choosing an element x from A. For any distribution D, let x ∈ D be the action of randomly choosing an element according to D. It will be clear from context when ∈ is used to refer to an element in a set, and when it refers to choosing from a distribution.” [pg. 3, § Notation 5.]])
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model selection method disclosed by Aslan to implement a random number generator function to randomly select data as taught by Myers. One would have been motivated to make this modification in order to amplify security and protect the machine learning models from attacks. 

	Regarding claim 3, the combination of Aslan and Myers teaches The method of claim 1, wherein each of the plurality of machine learning models is a neural network (“The machine learning models 100 and 102, and any of the machine learning models discussed herein, can be implemented as any type of machine learning model. For example, suitable machine learning models for use with the techniques and systems described herein include, without limitation, tree-based models, support vector machines (SVMs), kernel methods, neural networks, random forests, splines (e.g., multivariate adaptive regression splines), hidden Markov model (HMMs), Kalman filters (or enhanced Kalman filters), Bayesian networks (or Bayesian belief networks), expectation maximization, genetic algorithms, linear regression algorithms, nonlinear regression algorithms, logistic regression-based classification models, or an ensemble thereof.” [¶0022]).

	Regarding claim 7, the combination of Aslan and Myers teaches The method of claim 1, wherein each of the plurality of machine learning models uses one of either a same training set selected from the plurality of data elements, different training sets that have one or more of the same data elements, and disjunct training sets (“The N teacher models 200 can be of the same type and size, or can differ in type (i.e., architecture) and/or size. In the implementation of FIG. 2, the student model 202 is to be jointly trained in parallel with the N teacher models 200, where each model 200(1)-(N) and 202 is to learn substantially similar tasks. In this sense, each of the teacher models 200 can influence the training of the student model 202, and vice versa, during joint training. Each of the N teacher models 200 is also shown as receiving corresponding training data 204(1)-(N). The training data 204(1)-(N) can each comprise an independent source of training data, or the training data 204(1)-(N) can represent a single source of training data 204 that is used by the teacher models 200 for training.” [¶0047; training data coming from a single source would mean the same set of training data is used to train the machine learning models.]).

Regarding claim 8, the combination of Aslan and Myers teaches The method of claim 1, where Aslan further teaches wherein all the plurality of machine learning models are binary classification models (“At 602, a set of multiple machine learning models, such as the first model 100 and the second model 102 of FIG. 1, can be provided. Each of the machine learning models in the set can be capable of learning a task, such as a classification task (binary or multi-label), a regression task to infer a set of probabilities based on unknown input data, or any other suitable machine learning task.” [¶0059]).

Regarding claim 9, Aslan teaches A method comprising:
combining a plurality of machine learning models into a machine learning ensemble (“The machine learning models 100 and 102, and any of the machine learning models discussed herein, can be implemented as any type of machine learning model…. An “ensemble” can comprise a collection of models whose outputs (predictions) are combined, such as by using weighted averaging or voting. The individual machine learning models of an ensemble can differ in their expertise, and the ensemble can operate as a committee of individual machine learning models that is collectively “smarter” than any individual machine learning model of the ensemble.” [¶0022]); 
providing a plurality of data elements for training the machine learning ensemble (“The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102. For example, the training data 104 can comprise a repository of images that are to be classified or labeled by the machine learning models 100 and/or 102. The training data 104 can further include at least two additional components: features and labels. However, the training data 104 may be unlabeled in some implementations, such that the machine learning models 100 and/or 102 can be trained using any suitable learning technique, such as supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and so on.” [¶0024; Examiner is interpreting training data to be equivalent to data elements.]); 
training the machine learning ensemble using the plurality of data elements to produce a trained machine learning ensemble (“FIG. 1 further illustrates that training data 104 can be used to train at least one of the machine learning models 100 and/or 102. FIG. 1 shows that both machine learning models 100 and 102 can receive at least some of the training data 104, but this is merely shown for exemplary purposes.” [¶0023, the machine learning models in the ensemble receiving training data would imply that a machine learning ensemble is produced.]); 
receiving an input during inference operation of the machine learning ensemble (“For example, the first model 100 can be trained to infer a set of probabilities for a multi-label classification task based on unknown image data received as input” [¶0025]); and 
choosing, one of the plurality of machine learning models to provide an output in response to the input (“The tasks are similar in that they relate to classifying unknown images by one of multiple class labels, but one model (the first model 100) outputs a set of probabilities as a prediction while the other model (the second model 102) outputs class labels. In general, the “task” can comprise a task to infer an expected output based at least in part on an unknown input.” [¶0025; Aslan discloses a first/second model providing an expected output in response to receiving an input, this would correspond to choosing a model to provide an output.]).
However Aslan fails to explicitly teach pseudo-randomly choosing, using a piecewise function 
Myers teaches pseudo-randomly choosing, using a piecewise function (“
    PNG
    media_image1.png
    163
    476
    media_image1.png
    Greyscale
” [pg. 7, ¶2; Furthermore, Myers discloses “For any set A, let x ∈ A be the action of uniformly at random choosing an element x from A. For any distribution D, let x ∈ D be the action of randomly choosing an element according to D. It will be clear from context when ∈ is used to refer to an element in a set, and when it refers to choosing from a distribution.” [pg. 3, § Notation 5.]])
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model selection method disclosed by Aslan to implement a random number generator function to randomly select data as taught by Myers. One would have been motivated to make this modification in order to amplify security and protect the machine learning models from attacks. 

Regarding claim 10, the combination of Aslan and Myers teaches The method of claim 9, wherein each of the plurality of machine learning models is a neural network (“The machine learning models 100 and 102, and any of the machine learning models discussed herein, can be implemented as any type of machine learning model. For example, suitable machine learning models for use with the techniques and systems described herein include, without limitation, tree-based models, support vector machines (SVMs), kernel methods, neural networks, random forests, splines (e.g., multivariate adaptive regression splines), hidden Markov model (HMMs), Kalman filters (or enhanced Kalman filters), Bayesian networks (or Bayesian belief networks), expectation maximization, genetic algorithms, linear regression algorithms, nonlinear regression algorithms, logistic regression-based classification models, or an ensemble thereof.” [¶0022]).

Regarding claim 14, the combination of Aslan and Myers teaches The method of claim 9, where Aslan further teaches wherein each of the plurality of machine learning models uses one of either a same training set selected from the plurality of data elements, different training sets that have one or more of the same data elements, and disjunct training sets (“The N teacher models 200 can be of the same type and size, or can differ in type (i.e., architecture) and/or size. In the implementation of FIG. 2, the student model 202 is to be jointly trained in parallel with the N teacher models 200, where each model 200(1)-(N) and 202 is to learn substantially similar tasks. In this sense, each of the teacher models 200 can influence the training of the student model 202, and vice versa, during joint training. Each of the N teacher models 200 is also shown as receiving corresponding training data 204(1)-(N). The training data 204(1)-(N) can each comprise an independent source of training data, or the training data 204(1)-(N) can represent a single source of training data 204 that is used by the teacher models 200 for training.” [¶0047; training data coming from a single source would mean the same set of training data is used to train the machine learning models.]).

Regarding claim 15, the combination of Aslan and Myers teaches The method of claim 9, where Aslan further teaches wherein all the plurality of machine learning models are binary classification models (“At 602, a set of multiple machine learning models, such as the first model 100 and the second model 102 of FIG. 1, can be provided. Each of the machine learning models in the set can be capable of learning a task, such as a classification task (binary or multi-label), a regression task to infer a set of probabilities based on unknown input data, or any other suitable machine learning task.” [¶0059]).

Regarding claim 16, Aslan teaches A method comprising:
combining a plurality of machine learning models into a machine learning ensemble (“The machine learning models 100 and 102, and any of the machine learning models discussed herein, can be implemented as any type of machine learning model…. An “ensemble” can comprise a collection of models whose outputs (predictions) are combined, such as by using weighted averaging or voting. The individual machine learning models of an ensemble can differ in their expertise, and the ensemble can operate as a committee of individual machine learning models that is collectively “smarter” than any individual machine learning model of the ensemble.” [¶0022]); 
providing a plurality of data elements for training the machine learning ensemble (“The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102. For example, the training data 104 can comprise a repository of images that are to be classified or labeled by the machine learning models 100 and/or 102. The training data 104 can further include at least two additional components: features and labels. However, the training data 104 may be unlabeled in some implementations, such that the machine learning models 100 and/or 102 can be trained using any suitable learning technique, such as supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and so on.” [¶0024; Examiner is interpreting training data to be equivalent to data elements.]), each of the plurality of machine learning models are implemented differently (“Additionally, or alternatively, the models involved in joint training according to the techniques and systems described herein can differ in: (i) the learning methods they employ during training, (ii) their respective speed of operation at runtime, (iii) their ability to be distributed across many different machines for use in parallel processing environments, or (iv) their “understandability” in that one model is in a language more comprehensible to humans than the other, and so on.” [¶0046]); 
training the machine learning ensemble using the plurality of data elements to produce a trained machine learning ensemble (“FIG. 1 further illustrates that training data 104 can be used to train at least one of the machine learning models 100 and/or 102. FIG. 1 shows that both machine learning models 100 and 102 can receive at least some of the training data 104, but this is merely shown for exemplary purposes.” [¶0023, the machine learning models in the ensemble receiving training data would imply that a machine learning ensemble is produced.]); 
receiving an input during inference operation of the machine learning ensemble (“For example, the first model 100 can be trained to infer a set of probabilities for a multi-label classification task based on unknown image data received as input” [¶0025]); and 
choosing, one of the plurality of machine learning models to provide an output in response to the input (“The tasks are similar in that they relate to classifying unknown images by one of multiple class labels, but one model (the first model 100) outputs a set of probabilities as a prediction while the other model (the second model 102) outputs class labels. In general, the “task” can comprise a task to infer an expected output based at least in part on an unknown input.” [¶0025; Aslan discloses a first/second model providing an expected output in response to receiving an input, this would correspond to choosing a model to provide an output.]).
However Aslan fails to explicitly teach pseudo-randomly choosing, using a piecewise function 
Myers teaches pseudo-randomly choosing, using a piecewise function (“
    PNG
    media_image1.png
    163
    476
    media_image1.png
    Greyscale
” [pg. 7, ¶2; Furthermore, Myers discloses “For any set A, let x ∈ A be the action of uniformly at random choosing an element x from A. For any distribution D, let x ∈ D be the action of randomly choosing an element according to D. It will be clear from context when ∈ is used to refer to an element in a set, and when it refers to choosing from a distribution.” [pg. 3, § Notation 5.]])
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model selection method disclosed by Aslan to implement a random number generator function to randomly select data as taught by Myers. One would have been motivated to make this modification in order to amplify security and protect the machine learning models from attacks. 
Regarding claim 19, the combination of Aslan and Myers teaches The method of claim 16, where Aslan further teaches wherein each of the plurality of machine learning models uses one of either a same training set selected from the plurality of data elements, different training sets that have one or more of the same data elements, and disjunct training sets (“The N teacher models 200 can be of the same type and size, or can differ in type (i.e., architecture) and/or size. In the implementation of FIG. 2, the student model 202 is to be jointly trained in parallel with the N teacher models 200, where each model 200(1)-(N) and 202 is to learn substantially similar tasks. In this sense, each of the teacher models 200 can influence the training of the student model 202, and vice versa, during joint training. Each of the N teacher models 200 is also shown as receiving corresponding training data 204(1)-(N). The training data 204(1)-(N) can each comprise an independent source of training data, or the training data 204(1)-(N) can represent a single source of training data 204 that is used by the teacher models 200 for training.” [¶0047; training data coming from a single source would mean the same set of training data is used to train the machine learning models.]).

Regarding claim 20, the combination of Aslan and Myers teaches The method of claim 16, where Aslan further teaches wherein all the plurality of machine learning models are binary classification models (“At 602, a set of multiple machine learning models, such as the first model 100 and the second model 102 of FIG. 1, can be provided. Each of the machine learning models in the set can be capable of learning a task, such as a classification task (binary or multi-label), a regression task to infer a set of probabilities based on unknown input data, or any other suitable machine learning task.” [¶0059]).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Aslan in view of Myers further in view of Binev et al. ("Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions" filed by Applicant in IDS filed 09/28/2018, hereinafter "Binev").

Regarding claim 2, the combination of Aslan and Myers teaches The method of claim 1, however fails to explicitly teach wherein the piecewise function is further characterized as being a piecewise constant function.
Binev teaches wherein the piecewise function is further characterized as being a piecewise constant function (“The universal estimator studied in this paper consists of a least-square fitting procedure using piecewise constant functions on a partition which depends adaptively on the data.” [Abstract]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Binev teaches universal algorithms for learning using constant piecewise functions. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Aslan and Myers to substitute the piecewise function taught by Myers with the constant piecewise function as taught by Binev. One would have been motivated to use a constant piecewise function in order to find the optimal rate of convergence in wide sets of data. [Abstract, Binev]

Claims 4, 5, 11, 12, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Aslan in view of Myers and further in view of Beyer et al. ("US 20110320184 A1", hereinafter "Beyer").

Regarding claim 4, the combination of Aslan and Myers teaches The method of claim 1, where Aslan further teaches wherein the each of the plurality of machine learning models have different machine learning algorithms (“Additionally, or alternatively, the models involved in joint training according to the techniques and systems described herein can differ in: (i) the learning methods they employ during training, (ii) their respective speed of operation at runtime, (iii) their ability to be distributed across many different machines for use in parallel processing environments, or (iv) their “understandability” in that one model is in a language more comprehensible to humans than the other, and so on.” [¶0046]), 
However the combination of Aslan and Myers fails to explicitly teach and wherein the step of pseudo- randomly choosing takes the input as a seed for providing pseudo-randomness.
Beyer teaches and wherein the step of pseudo-randomly choosing takes the input as a seed for providing pseudo-randomness (“Initialize generator G1 with a fixed seed s01. All nodes use the same seed.” [¶0056; note: Generator G1 is a pseudo number generator function which implies “providing pseudo-randomness”.]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to further input a seed to provide pseudo-randomness as disclosed by Beyer. Inputting a seed into a pseudo random function is well-known and would yield predictable results. 

Regarding claim 5, the combination of Aslan and Myers teaches The method of claim 1, where Aslan further teaches during the inference operation (“For example, the first model 100 can be trained to infer a set of probabilities for a multi-label classification task based on unknown image data received as input, and the second model 100 can be trained to classify the unknown image data as one of multiple possible class labels, but does not infer a set of probabilities as output. The tasks are similar in that they relate to classifying unknown images by one of multiple class labels, but one model (the first model 100) outputs a set of probabilities as a prediction while the other model (the second model 102) outputs class labels. In general, the “task” can comprise a task to infer an expected output based at least in part on an unknown input.” [¶0025; Aslan discloses a first/second model providing an expected output in response to receiving an input, this would correspond to choosing a model to provide an output. An inference operation would correspond to a model receiving an input and producing an output in response to the input which is disclosed by Aslan.])
 Myers further teaches wherein the pseudo-random function is defined as F:2S->{0,1, k - 1} (“
    PNG
    media_image2.png
    143
    480
    media_image2.png
    Greyscale
” [pg. 5, § 2.2 Function Generators, § Definition 5])
However the combination fails to explicitly teach where s is a bit size of the input, and k is the number of machine learning models in the plurality of machine learning models.
Beyer teaches where s is a bit size of the input (“
    PNG
    media_image3.png
    238
    342
    media_image3.png
    Greyscale
” [¶0030; Beyer discloses the generator uses a multiple of 32 for the number of bits in each sequence of seeds which would correspond to 2^bit size of the input.]), and k is the number of machine learning models in the plurality of machine learning models (“
    PNG
    media_image4.png
    299
    346
    media_image4.png
    Greyscale
” [¶0064; note: Beyer’s pseudo random function provides the option of “k-1”, however doesn’t explicitly teach the machine learning models.]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to use a 2^bit size for the input and provide an option for randomly selecting as disclosed by Beyer. Using a 2^bit size in pseudo random number generation is well-known and would yield predictable results. Furthermore, substituting the “k” provided by Beyer with the machine learning models taught by Aslan would be obvious as it would be implemented by simply switching data. 


Regarding claim 11, the combination of Aslan and Myers teaches The method of claim 9, where Aslan further teaches wherein the each of the plurality of machine learning models have different machine learning algorithms (“Additionally, or alternatively, the models involved in joint training according to the techniques and systems described herein can differ in: (i) the learning methods they employ during training, (ii) their respective speed of operation at runtime, (iii) their ability to be distributed across many different machines for use in parallel processing environments, or (iv) their “understandability” in that one model is in a language more comprehensible to humans than the other, and so on.” [¶0046]), 
However the combination of Aslan and Myers fails to explicitly teach and wherein the step of pseudo- randomly choosing takes the input as a seed for providing pseudo-randomness.
Beyer teaches and wherein the step of pseudo-randomly choosing takes the input as a seed for providing pseudo-randomness (“Initialize generator G1 with a fixed seed s01. All nodes use the same seed.” [¶0056; note: Generator G1 is a pseudo number generator function which implies “providing pseudo-randomness”.]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to further input a seed to provide pseudo-randomness as disclosed by Beyer. Inputting a seed into a pseudo random function is well-known and would yield predictable results. 

Regarding claim 12, the combination of Aslan and Myers teaches The method of claim 9, where Aslan further teaches during the inference operation (“For example, the first model 100 can be trained to infer a set of probabilities for a multi-label classification task based on unknown image data received as input, and the second model 100 can be trained to classify the unknown image data as one of multiple possible class labels, but does not infer a set of probabilities as output. The tasks are similar in that they relate to classifying unknown images by one of multiple class labels, but one model (the first model 100) outputs a set of probabilities as a prediction while the other model (the second model 102) outputs class labels. In general, the “task” can comprise a task to infer an expected output based at least in part on an unknown input.” [¶0025; Aslan discloses a first/second model providing an expected output in response to receiving an input, this would correspond to choosing a model to provide an output. An inference operation would correspond to a model receiving an input and producing an output in response to the input which is disclosed by Aslan.])
 Myers further teaches wherein the pseudo-random function is defined as F:2S->{0,1, k - 1} (“
    PNG
    media_image2.png
    143
    480
    media_image2.png
    Greyscale
” [pg. 5, § 2.2 Function Generators, § Definition 5])
However the combination fails to explicitly teach where s is a bit size of the input, and k is the number of machine learning models in the plurality of machine learning models.
Beyer teaches where s is a bit size of the input (“
    PNG
    media_image3.png
    238
    342
    media_image3.png
    Greyscale
” [¶0030; Beyer discloses the generator uses a multiple of 32 for the number of bits in each sequence of seeds which would correspond to 2^bit size of the input.]), and k is the number of machine learning models in the plurality of machine learning models (“
    PNG
    media_image4.png
    299
    346
    media_image4.png
    Greyscale
” [¶0064; note: Beyer’s pseudo random function provides the option of “k-1”, however doesn’t explicitly teach the machine learning models.]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to use a 2^bit size for the input and provide an option for randomly selecting as disclosed by Beyer. Using a 2^bit size in pseudo random number generation is well-known and would yield predictable results. Furthermore, substituting the “k” provided by Beyer with the machine learning models taught by Aslan would be obvious as it would be implemented by simply switching data. 

Regarding claim 17, the combination of Aslan and Myers teaches The method of claim 16, where Aslan further teaches wherein the each of the plurality of machine learning models have different machine learning algorithms (“Additionally, or alternatively, the models involved in joint training according to the techniques and systems described herein can differ in: (i) the learning methods they employ during training, (ii) their respective speed of operation at runtime, (iii) their ability to be distributed across many different machines for use in parallel processing environments, or (iv) their “understandability” in that one model is in a language more comprehensible to humans than the other, and so on.” [¶0046]), 
However the combination of Aslan and Myers fails to explicitly teach and wherein the step of pseudo- randomly choosing takes the input as a seed for providing pseudo-randomness.
Beyer teaches and wherein the step of pseudo-randomly choosing takes the input as a seed for providing pseudo-randomness (“Initialize generator G1 with a fixed seed s01. All nodes use the same seed.” [¶0056; note: Generator G1 is a pseudo number generator function which implies “providing pseudo-randomness”.]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to further input a seed to provide pseudo-randomness as disclosed by Beyer. Inputting a seed into a pseudo random function is well-known and would yield predictable results. 

Claims 6, 13, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Aslan in view of Myers and further in view of Harmon et al. ("Activation Ensembles for Deep Neural Networks" filed by Applicant in the IDS filed 06/16/2020, hereinafter "Harmon").

Regarding claim 6, the combination of Aslan and Myers teaches The method of claim 1, however fails to explicitly teach wherein training the machine learning ensemble uses a back- propagation training algorithm to produce the trained machine learning ensemble.
Harmon teaches wherein training the machine learning ensemble uses a back-propagation training algorithm to produce the trained machine learning ensemble (“An activation ensemble consists of two important parts. The first is the main α parameter attached to each activation function for each neuron. This variable assigns a weight to each activation function considered, i.e., it designs a convex combination of activation functions. The second are a set of “offset” parameters, η and δ, which we use to dynamically offset normalization range for each function. Training of these new parameters occurs during typical model training and is done through backpropagation.” [pg. 2, top left col, ¶2]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. Harmon teaches training activation ensembles using backpropagation. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to implement a back-propagation algorithm as disclosed by Harmon. One would have been motivated to use a back-propagation algorithm in order to update the weights and further optimize the model to select the best function. [pg. 1, § Introduction, ¶3-5]

Regarding claim 13, the combination of Aslan and Myers teaches The method of claim 9, however fails to explicitly teach wherein training the machine learning ensemble uses a back-propagation training algorithm to produce the trained machine learning ensemble.
Harmon teaches wherein training the machine learning ensemble uses a back-propagation training algorithm to produce the trained machine learning ensemble (“An activation ensemble consists of two important parts. The first is the main α parameter attached to each activation function for each neuron. This variable assigns a weight to each activation function considered, i.e., it designs a convex combination of activation functions. The second are a set of “offset” parameters, η and δ, which we use to dynamically offset normalization range for each function. Training of these new parameters occurs during typical model training and is done through backpropagation.” [pg. 2, top left col, ¶2]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. Harmon teaches training activation ensembles using backpropagation. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to implement a back-propagation algorithm as disclosed by Harmon. One would have been motivated to use a back-propagation algorithm in order to update the weights and further optimize the model to select the best function. [pg. 1, § Introduction, ¶3-5]

Regarding claim 18, the combination of Aslan and Myers teaches The method of claim 16, however fails to explicitly teach wherein training the machine learning ensemble uses a back- propagation training algorithm to produce the trained machine learning ensemble.
Harmon teaches wherein training the machine learning ensemble uses a back-propagation training algorithm to produce the trained machine learning ensemble (“An activation ensemble consists of two important parts. The first is the main α parameter attached to each activation function for each neuron. This variable assigns a weight to each activation function considered, i.e., it designs a convex combination of activation functions. The second are a set of “offset” parameters, η and δ, which we use to dynamically offset normalization range for each function. Training of these new parameters occurs during typical model training and is done through backpropagation.” [pg. 2, top left col, ¶2]).
Aslan teaches a machine learning ensemble method where a best model is selected based on the ensemble’s output. Myers teaches a model using a pseudo random function generator to increase security of the data. Beyer teaches parallel computing using pseudo-random number generators. Harmon teaches training activation ensembles using backpropagation. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the model ensemble method and pseudo random number generator as taught by Aslan and Myers to implement a back-propagation algorithm as disclosed by Harmon. One would have been motivated to use a back-propagation algorithm in order to update the weights and further optimize the model to select the best function. [pg. 1, § Introduction, ¶3-5]

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Boneh et al. ("Constrained Pseudorandom Functions and Their Applications") teaches pseudo random functions in the field of cryptography.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/           Supervisory Patent Examiner, Art Unit 2122