Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 11 is objected to because of the following informalities: 
In Claim 11, line 3, “models” was probably meant to be: the models. 
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-2, 5-6, 11-12, 14-15, 17-18, 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1, lines 11-12, recites the limitation “at least some recommendation models” (emphasis added). It is unclear whether or not these recommendation models of this “second pipeline” are the same/different from the recommendation models of the “first pipeline” as recited in line 7. Dependent claims are also subsequently rejected.
Claim 11, line 5, recites the limitation “a model”. It is unclear whether this model is the same/different model recited in independent Claim 8 at lines 5-6. Dependent claims are also subsequently rejected.
Claim 17, line 11, recites the limitation “the computer simulations” which lacks antecedent basis. Dependent claims are subsequently rejected.
Response to Amendment
The previous 35 USC 112(d) rejection on Claim 19 is withdrawn based on the amendments to the claim.
Response to Arguments
Applicant’s arguments filed on 08/13/2022 have been fully considered but they are not persuasive. As an initial matter the Examiner points out that any additional references pointed to in the rejection of claim limitations by way of Examiner’s notes are there to provide the applicants additional related prior arts made of record that are considered pertinent to those limitations. Such comments are entirely consistent with the intent and spirit of compact prosecution. In response to the applicant’s assertion that the prior art does not teach the limitations of Claim 1 (see pp. 8-10 of Remarks) the Examiner points out that the prior art of Ghanta is used to teach the pipelines for machine learning model training and inference, and Mnih is specifically used to address the limitations pertaining to the usage of reinforcement learning as applied to computer games as is emphasized in the rejection. In regards to the applicant’s argument that Mnih does not teach “equating a recommendation associated with a time “t” to a reward associated with the time “t” plus a product of a discount factor and a recommendation associated with a time t+1” (see pp. 10-13 of Remarks), the Examiner’s points out that the linearized application of Bellman equation as discussed collectively in paragraphs 50-54 of Mnih teaches this (see specifically the future discounted rewards as indicated in the summation equation of paragraph 50 of Mnih, where the summation will include only the first two terms, that is for time step t and time step t+1, the second term being multiplied by the discount factor, the first term discount factor being equal to 1 due to is exponent being zero from the summation indices being equal. The equation two terms therefore becoming: Rt = 1rt + ɣrt+1. This was also pointed out in the NPL of Crandall, p. 289, equation 5, of the Office Action rejection in an Examiner’s note).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 5-6, 8-12, 14-15, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ghanta, US 2020/0034665 A1, in view of Campos, US 2018/0357552 A1, and further in view of Mnih, US 2015/0100530 A1.

Regarding Claim 1, Ghanta teaches:
An apparatus, comprising: at least one processor comprising instructions executable by the at least one processor to: 
input the data to a training service of a first pipeline of model generation computerized services to train plural recommendation models (paragraphs 52-53, 56: datasets used in machine learning pipelines for training machine learning algorithms/models); 
use an inference service of the first pipeline to generate recommendations based on recommendation models trained using the training service in the first pipeline (paragraphs 56-57, 60: inference pipeline for providing inference/recommendations for the objective using the machine learning models); 
provide output of the inference service to an experimentation service of the first pipeline to test the recommendations to select a subset of the models using at least one key performance indicator (KPI) (Abstract; paragraphs 42, 59, 71-72, 74: selecting the machine learning pipeline and model based on the best fit for the objective being analyzed, that is an evaluation of key performance of the model); 
use a training and an inference service of a second pipeline to provide recommendations of at least some recommendation models to train (Abstract; paragraphs 38, 102: the second machine learning algorithm predicts the suitability of the first machine learning model for analyzing the inference data set, that is determining a recommendation of whether to use/train the first machine learning model); 
provide the recommendations of the at least some models to train generated by the second pipeline to the training service of the first pipeline (Abstract; paragraphs 3, 56, 102: providing recommendation of the machine learning algorithm/model). 
Ghanta may not have explicitly taught:
receive data representing input to computer simulations by plural simulation players. (Emphasis added).
However, Campos shows (Paragraphs 110, 112, 124-130: wherein it is discussed the managing of multiple simulations occurring at the same time, that can be from simulation players, to train multiple AI concepts).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Campos with that of Ghanta for receiving data representing simulations.
The ordinary artisan would have been motivated to modify Ghanta in the manner set forth above for the purposes of improving a system’s capability to extract and optimize knowledge faster from large and complex simulations and data, thereby making users using the system more productive and decreasing the duration of training to accomplish a complex task [Campos: paragraph 124].
Although Campos teaches reinforcement learning, neither Ghanta nor Campos may have taught all of the following:
execute a reinforcement learning model (RL) to use the training and inference services of the second pipeline to identify at least a first model from the first pipeline at least in part by maximizing a reward predicted for the first model, wherein the maximizing is executed at least in part by equating a recommendation associated with a time "t" to a reward associated with the time "t" plus a product of a discount factor and a recommendation associated with a time t+1; 
and execute at least one of the recommendation models to provide recommendations for new computer simulations to provide to players of at least one computer simulation, the computer simulations comprising computer games. (Emphasis added).
However, Mnih shows (paragraphs 10-11, 18, 26, 33, 47-53, 57, 60-63: wherein it is collectively discussed the use of reinforcement learning that use a discount factor for providing recommendations, or a next state for an action, to users/players of computer games. Also, see specifically the future discounted rewards as indicated in the summation equation of paragraph 50 of Mnih, where the summation will include only the first two terms, that is for time step t and time step t+1, the second term being multiplied by the discount factor, the first term discount factor being equal to 1 due to is exponent being zero from the summation indices being equal. The equation two terms therefore becoming: Rt = 1rt + ɣrt+1. Examiner’s note: See also the NPL of Crandall, p. 289, equation 5; and the applicant’s provided NPL of Mnih, sections 2-3).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Mnih with that of Ghanta and Campos for executing a reinforcement learning model in part by maximizing a reward using a discount factor and providing recommendations to players of computer games.
The ordinary artisan would have been motivated to modify Ghanta and Campos in the manner set forth above for the purposes of using reinforcement learning for a control task such as computer gaming [Mnih: paragraphs 39, 93-94].

Regarding Claim 2, Ghanta further teaches:
The apparatus of Claim 1, wherein the instructions are executable to: classify the recommendation models in the second pipeline to generate classifications (paragraph 76, 94: the machine learning algorithms/models include classification models for classification).

Regarding Claim 5, Ghanta further teaches:
The apparatus of Claim 1, wherein the instructions are executable to: execute an evolution strategy model (ES) to use the training and inference services of the second pipeline to use at least the first model identified by the training service of the second pipeline to identify future models to be trained by the first pipeline (paragraphs 71-72, 74, 102: determining the best machine model for training or future use).

Regarding Claim 6, Ghanta further teaches:
The apparatus of Claim 5, wherein the instructions are executable to execute the ES to learn, based on the classifications, model meta-data; and generate the future models at least in part based on the meta-data (paragraphs 3, 37, 80: the second machine learning algorithm/model trained/generated using an error data set that is meta data. Examiner’s note: See also Campos, paragraphs 158, 164, meta learning using meta-data).

Regarding Claim 19, Mnih further teaches:
The apparatus of Claim 1, wherein the discount factor is chosen based at least in part on taking an immediately suboptimal action and maximizing future reward (paragraph 18, 50-51: discount factor based on maximizing future rewards based on previous actions. Note that if this previous action is not the last action taken then it is necessarily not the optimal action but a suboptimal action. Examiner’s note: See also the NPL of Crandall, p. 289). 

Regarding Claim 8, Ghanta teaches:
A system, comprising: a first plurality of computers implementing a first pipeline for training models and providing model predictions; a second plurality of computers implementing a second pipeline for receiving the models from the first pipeline, identifying at least a first model of the models from the first pipeline as being a model satisfying at least one criterion, and feeding back the first model to the first pipeline to enable the first pipeline to generate new models (paragraphs 52-53, 56: machine learning pipelines for training machine learning algorithms/models. And, paragraphs 56-57, 60: inference pipeline for providing inference/predictions for the objective using the machine learning models. And, Abstract; paragraphs 42, 59, 71-72, 74: selecting the machine learning pipeline and model based on the best fit for the objective being analyzed, that is a criterion of the model. And, Abstract; paragraphs 38, 64, 67, 102: the second machine learning algorithm predicts the suitability of the first machine learning model for analyzing the inference data set, that is determining a recommendation or feedback of whether to use/train the first machine learning model or to generate new models). 
Ghanta may not have explicitly taught:
for new computer simulations to provide to players of at least one computer simulation. (Emphasis added).
However, Campos shows (Paragraphs 110, 112, 124-130: wherein it is discussed the managing of multiple simulations occurring at the same time, that can be from simulation players, to train multiple AI concepts).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Campos with that of Ghanta for receiving data representing simulations.
The ordinary artisan would have been motivated to modify Ghanta in the manner set forth above for the purposes of improving a system’s capability to extract and optimize knowledge faster from large and complex simulations and data, thereby making users using the system more productive and decreasing the duration of training to accomplish a complex task [Campos: paragraph 124].
Although Campos teaches reinforcement learning, neither Ghanta nor Campos may have taught all of the following:
identify at least the first model from the first pipeline at least in part by maximizing a reward predicted for the first model by equating a recommendation associated with a time "t" to a reward associated with the time "t" plus a product of a discount factor and a recommendation associated with a time t+1;
and execute at least one of the models to provide recommendations 
wherein the computer simulations comprise computer games. (Emphasis added).
However, Mnih shows (paragraphs 10-11, 18, 26, 33, 47-53, 57, 60-63: wherein it is collectively discussed the use of reinforcement learning that use a discount factor for providing recommendations, or a next state for an action, to users/players of computer games. Also, see specifically the future discounted rewards as indicated in the summation equation of paragraph 50 of Mnih, where the summation will include only the first two terms, that is for time step t and time step t+1, the second term being multiplied by the discount factor, the first term discount factor being equal to 1 due to is exponent being zero from the summation indices being equal. The equation two terms therefore becoming: Rt = 1rt + ɣrt+1. Examiner’s note: See also the NPL of Crandall, p. 289, equation 5; and the applicant’s provided NPL of Mnih, sections 2-3).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Mnih with that of Ghanta and Campos for executing a reinforcement learning model in part by maximizing a reward using a discount factor and providing recommendations to players of computer games.
The ordinary artisan would have been motivated to modify Ghanta and Campos in the manner set forth above for the purposes of using reinforcement learning for a control task such as computer gaming [Mnih: paragraphs 39, 93-94]. 

Regarding Claim 17, Ghanta teaches:
A method comprising: training prediction models using a first pipeline, the first pipeline being computerized; identifying at least a first model from the prediction models of the first pipeline using a second pipeline, the second pipeline being computerized, the identifying of at least the first model from the first pipeline being least in part by 
feeding back information associated with the first model to the first pipeline; and outputting recommendations using at least a first model among the prediction models (paragraphs 52-53, 56: machine learning pipelines for training machine learning algorithms/models. And, paragraphs 56-57, 60: inference pipeline for providing inference/predictions for the objective using the machine learning models. And, Abstract; paragraphs 42, 59, 71-72, 74: selecting the machine learning pipeline and model based on the best fit for the objective being analyzed. And, Abstract; paragraphs 38, 64, 67, 102: the second machine learning algorithm predicts the suitability of the first machine learning model for analyzing the inference data set, that is identifying whether to use/train the first machine learning model or to generate new models using feedback. And, Abstract; paragraphs 3, 56, 102: providing or outputting recommendations of the machine learning algorithm/model).  
Ghanta may not have explicitly taught:
the recommendations comprising computer simulation recommendations to provide to players of at least one computer simulation. (Emphasis added).
However, Campos shows (Paragraphs 110, 112, 124-130: wherein it is discussed the managing of multiple simulations occurring at the same time, that can be from simulation players, to train multiple AI concepts).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Campos with that of Ghanta for receiving data representing simulations.
The ordinary artisan would have been motivated to modify Ghanta in the manner set forth above for the purposes of improving a system’s capability to extract and optimize knowledge faster from large and complex simulations and data, thereby making users using the system more productive and decreasing the duration of training to accomplish a complex task [Campos: paragraph 124].
Although Campos teaches reinforcement learning, neither Ghanta nor Campos may have taught all of the following:
maximizing a reward predicted for the first model by equating a recommendation associated with a time "t" to a reward associated with the time "t" plus a product of a discount factor and a recommendation associated with a time t+1; 
wherein the computer simulations comprise computer games. (Emphasis added).
However, Mnih shows (paragraphs 10-11, 18, 26, 33, 47-53, 57, 60-63: wherein it is collectively discussed the use of reinforcement learning that use a discount factor for providing recommendations, or a next state for an action, to users/players of computer games. Also, see specifically the future discounted rewards as indicated in the summation equation of paragraph 50 of Mnih, where the summation will include only the first two terms, that is for time step t and time step t+1, the second term being multiplied by the discount factor, the first term discount factor being equal to 1 due to is exponent being zero from the summation indices being equal. The equation two terms therefore becoming: Rt = 1rt + ɣrt+1. Examiner’s note: See also the NPL of Crandall, p. 289, equation 5; and the applicant’s provided NPL of Mnih, sections 2-3).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Mnih with that of Ghanta and Campos for executing a reinforcement learning model in part by maximizing a reward using a discount factor and providing recommendations to players of computer games.
The ordinary artisan would have been motivated to modify Ghanta and Campos in the manner set forth above for the purposes of using reinforcement learning for a control task such as computer gaming [Mnih: paragraphs 39, 93-94].  

Claims 9-10 limitations are contained in Claim 1 and are rejected under the same rationale as stated above in that claim.
Claim 11 is similar to Claim 2 and is rejected under the same rationale as stated above for that claim.
Claim 12 limitations are contained in Claim 1 and is rejected under the same rationale as stated above in that claim.
Claims 14-15 are similar to Claims 5-6 and are rejected under the same rationale as stated above for those claims.
Claim 18 limitations are contained in Claim 1 and is rejected under the same rationale as stated above in that claim.
Claim 20 is similar to Claims 5 and is rejected under the same rationale as stated above for that claim.

Examiner's Note:
The Examiner cites particular pages, sections, columns, line numbers, and/or paragraphs in the references as applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner and the additional related prior arts made of record that are considered pertinent to applicant's disclosure to further show the general state of the art. The Examiner's interpretations in parenthesis are provided with the cited references to assist the applicants to better understand how the examiner interprets the prior art to read on the claims. Such comments are entirely consistent with the intent and spirit of compact prosecution.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892 for the relevant prior art relating to this application where for example the NPL of Crandall teaches reinforcement learning using a discount factor for games.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVE MISIR whose telephone number is (571)272-5243. The examiner can normally be reached M-R 8-5 pm, F some hours.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on 5712703169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVE MISIR/Primary Examiner, Art Unit 2127