DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-5, 7-11, 13-16, and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being unpatentable over Kimura et al. US 2019/0272465 A1 (“Kimura”).
As to claim 1, Kimura discloses a computer-implemented method, comprising: 
receiving, from a customer of a simulation management service (The input may be from a variety of applications from users or customers, e.g., as in Paragraphs 40-43 or 62, or Figures 2A-C), computer- executable code defining a custom-designed reinforcement function for training a reinforcement learning model for a system (Figure 1 or Paragraphs 32-36 – e.g., an environment and expert used to train the system, computer code necessary in a system operating on a computer, e.g., as in Paragraph 145 or 148); 
storing the computer-executable code in association with an identifier of the custom-designed reinforcement function (Figure 1 or Paragraphs 145-148 – e.g., an identifier such as an name, address, index, or location is necessary in execution and storage on a computer); 
receiving a request to perform reinforcement learning for the system using a simulation application, the request specifying the identifier (Figure 3A or Paragraph 63 – e.g., “request for initiating the reinforcement learning process”, an identifier such as a name, address, index, or location is necessary in execution and storage on a computer); 
generating a simulation environment by at least using the identifier to obtain the computer-executable code and injecting the computer-executable code into the simulation application (Figures 1 and 3 or Paragraphs 48-50 or 71-73 – e.g., simulating states and transitions according to the inputted expert data and environment); and 
performing the reinforcement learning using the simulation environment (Figure 3 or Paragraphs 63-67).
As to claim 2, Kimura discloses the method of claim 1.  Kimura further discloses selecting a set of states and a set of actions for the system as input to the simulation application (Figures 1 or 3 or Paragraphs 32-34 or 49 – e.g., “expert demonstrations”); obtaining, in response to using the set of states and the set of actions as input, a reward value corresponding to performance of the set of actions in the simulation environment based on the set of states (Figures 1 or 3 or Paragraphs 48-58 – e.g., reward estimation based on states and agent actions); and updating the reinforcement learning model based on the reward value (Figure 3 or Paragraphs 58 or 75).
As to claim 3, Kimura discloses the method of claim 1.  Kimura further discloses training, during execution of the simulation application, the reinforcement learning model to identify changes to the reinforcement learning model based on output of the simulation application (Figure 3 or Paragraphs 58 or 75 – e.g., reinforcement learning by updating parameters); and evaluating the reinforcement learning model based on the changes (Paragraphs 70 or 77 - the updated parameters are evaluated iteratively).
As to claim 4, Kimura discloses the method of claim 1.  Kimura further discloses selecting a state for the system as input to the simulation application to cause the simulation application to perform an action in response to the state (Figure 3 or Paragraphs 71-74 – e.g., presenting states to the agent to select an action); obtaining, in response to the action performed in response to the state, a reward value corresponding to performance of the action in the simulation environment in response to the state (Figure 3 or Paragraph 74); and updating the reinforcement learning model based on the reward value (Figure 3 or Paragraphs 58 or 75).
Claim 5 recites elements similar to claim 1, and is rejected for the same reasons.  The examiner notes that Kimura discloses performing the method on a computer with processors and memory, e.g., as in Figure 1 or Paragraphs 145-148.
As to claim 7, Kimura discloses the method of claim 5.  Kimura further discloses wherein the computer-executable instructions further cause the first system to: obtain a set of simulation environment parameters for augmenting the simulation environment (Figures 1 or 3 or Paragraphs 32-34, 49, or 71-74 – e.g., “expert demonstrations” and states); and inject the simulation environment parameters into the simulation application to apply the simulation environment parameters to the simulation environment (Figure 3 or Paragraphs 71-74 – e.g., presenting states to the agent to select an action).
As to claim 8, Kimura discloses the method of claim 5.  Kimura further discloses wherein the computer-executable instructions further cause the first system to: evaluate the computer-executable code to identify suggestions for modifications to the computer-executable code (Figure 3 or Paragraphs 58 or 75 – e.g., reinforcement learning by identifying parameters to update); provide the suggestions for the modifications to the computer-executable code (Figure 3 or Paragraphs 58 or 75 – e.g., updating parameters); and store the computer-executable code in association with an identifier of the reinforcement function (Figure 3 or Paragraphs 58 or 75, an identifier such as a name, address, index, or location is necessary in execution and storage on a computer).
As to claim 9, Kimura discloses the method of claim 5.  Kimura further discloses wherein the computer-executable instructions further cause the first system to: select a first state for the second system as input to the simulation application to cause the simulation application to perform an action in response to the first state (Figures 1 or 3 or Paragraphs 32-34 or 48-58 – e.g., “expert demonstrations” and states); obtain, in response to the action performed in response to the first state, a reward value corresponding to performance of the action in the simulation environment in response to the first state (Figures 1 or 3 or Paragraphs 48-58 – e.g., reward estimation based on states and agent actions); update, based on the reward value, the reinforcement learning model (Figure 3 or Paragraphs 58 or 75); and select, based on the reward value, a second state for the second system as second input to the simulation application (Figure 3 or Paragraphs 76-77 – e.g., reward analysis performed for a plurality of time steps).
As to claim 10, Kimura discloses the method of claim 5.  Kimura further discloses wherein the computer-executable instructions further cause the first system to: select a first state and a first action corresponding to the first state as input to the simulation application (Figure 3 or Paragraphs 71-74 – e.g., presenting states to the agent to select an action); obtain, in response to the input, a reward value corresponding to performance of the first actions in the simulation environment based on the first state (Figure 3 or Paragraph 74); update, based on the reward value, the reinforcement learning model (Figure 3 or Paragraphs 58 or 75); and select, based on the reward value, a second state and a second action corresponding to the second state as input to the simulation application (Figure 3 or Paragraphs 76-77 – e.g., reward analysis performed for a plurality of time steps, with later time steps dependent on previous states and actions).
As to claim 11, Kimura discloses the method of claim 5.  Kimura further discloses wherein the computer-executable instructions further cause the first system to: evaluate, during execution of the simulation application, the reinforcement learning model to identify modifications to be applied to the reinforcement learning model based on output of the simulation application (Figure 3 or Paragraphs 58 or 75 – e.g., reinforcement learning by updating parameters); and update the reinforcement learning model to apply the modifications (Paragraphs 70 or 77 - the updated parameters are evaluated iteratively).
Claims 13-16 recite elements similar to claims 1, 4, 9, and 10, and are rejected for similar reasons.
As to claim 18, Kimura discloses the medium of claim 13.  Kimura further discloses wherein the instructions further cause the computer system to: receive a request to modify the simulation environment, the request specifying a set of parameters corresponding to modifications to the simulation environment (Figure 3 or Paragraphs 58, 62-64, or 75 – e.g., initiating reinforcement learning with a set of parameters and states); apply the set of parameters to the simulation environment to incorporate the modifications to the simulation environment (Figure 3 or Paragraphs 58, 62-64, or 75 – e.g., performing reinforcement learning based on the set of parameters and states); and update the model based on the modifications to the simulation environment (Figure 3 or Paragraphs 58 or 75).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 6, 12, 17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kimura.
As to claim 6, Kimura discloses the method of claim 5.  Kimura does not explicitly  discloses wherein the computer-executable instructions further cause the first system to expose, via a graphical user interface, an editor to allow an entity to generate the computer-executable code.  However, Kimura does teach the use of programming languages and a user computer including a display or interface (Figure 10 or Paragraph 148).  It would have been obvious to one having ordinary skill in the art at the time the invention was made to include an editor to allow an entity to generate the computer-executable code because editors are a commonly used way to allow a user to interface with code, and doing so would allow the user to use and operate the learning and simulation functionality.
As to claim 12, Kimura discloses the method of claim 5.  Kimura does not explicitly disclose wherein the computer-executable instructions further cause the first system to: provision a software container instance for execution of the simulation application; and provide the computer-executable code to the software container instance to inject the computer-executable code into the simulation application.  However, Kimura does teach the use of various programs, programming languages and a user computer including a display or interface (Figure 10 or Paragraph 148).  It would have been obvious to one having ordinary skill in the art at the time the invention was made to use a software container instance for the simulation process because the use of software containers is common in the use of computer programs, and doing so would allow the user to more easily use and operate the simulation software.
Claim 17 recites elements similar to claim 12, and is rejected for the same reasons.
As to claim 19, Kimura discloses the medium of claim 13.  Kimura teaches determining modifications to the code, but does not explicitly teach proposing the modifications to a client by evaluating the computer-executable code to identify a set of proposed modifications to the computer-executable code; and transmitting the set of proposed modifications to a client to allow a user of the client to incorporate the set of proposed modifications into the computer-executable code.  However, Kimura does teach the use of various programs, programming languages and a user interface including a display or interface (Figure 10 or Paragraph 148).  It would have been obvious to one having ordinary skill in the art at the time the invention was made to present modifications to the code to the user because requesting user permission for changes is commonly used in computers and user interfaces, and doing so would allow the user to have more control over the process.
Claim 20 recites elements similar to claim 6, and is rejected for the same reasons.  The examiner notes that a user interface and validation or compilation are well known in the art of programming editors or user interfaces.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYCE M AISAKA whose telephone number is (571)270-5808. The examiner can normally be reached M-F: 6:30AM-5:00PM PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jack Chiang can be reached on (571)272-7483. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYCE M AISAKA/Primary Examiner, Art Unit 2851