DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2020-01-07 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Status
Claims 1-14 are pending in the application

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically a mental process, particularly without significantly more. 
Claim 1 recites:
- “process environmental information including data about one or more space- based assets and one or more threats”; processing information is an example of an “observation, evaluation, judgment, opinion” as recited in MPEP 2106.04(a), and is thus a mental process
- “providing a suggestion to an analyst for which course of action to follow to mitigate the one or more threats against the one or more space-based assets”; providing a suggestion is an example of an “observation, evaluation, judgment, opinion” as recited in MPEP 2106.04(a), and is thus a mental process
This judicial exception is not integrated into a practical application because additional element “receive warnings from one or more sensors, wherein the warnings require a course of action” amounts to insignificant extra-solution activity, specifically “necessary data gathering and outputting” (see MPEP 2106.05(g)(3)).  Furthermore, “providing a reinforcement machine learning agent trained from a physics based space simulation” amounts to merely generally linking the use of a judicial exception to a particular technological environment (see MPEP 2106.05(h)). 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, again for the reasons recited above.  
Dependent claims 2-10 are also determined to be directed to a mental process.
Claim 2 recites “wherein the system provides a plurality of suggested courses of action along with respective confidence intervals”; calculating confidence intervals can be performed with pen and paper, and is thus a mental process.
Claim 3 recites “wherein processing environmental information includes assessing sample courses of action in sample situations”; assessing courses of action can be performed in the human mind, and is thus a mental process.
Claim 4 recites “wherein reinforcement machine learning comprises a reward function to push the system to learn ideal responses to various situations”; as discussed above in Claim 1, the use of the RL model amounts to merely generally linking the use of a judicial exception to a particular technological environment (see MPEP 2106.05(h)).
Claim 5 recites “wherein reinforcement machine learning comprises a loss function to calculate how well the system estimates a situation and the ideal action to take” ; as discussed above in Claim 1, the use of the RL model amounts to merely generally linking the use of a judicial exception to a particular technological environment (see MPEP 2106.05(h)).
Claim 6 recites “wherein environmental information is input via a text file, graphical user interface, or direct connection to sensors that output the environmental information”; this amounts to insignificant extra-solution activity, specifically “necessary data gathering and outputting” (see MPEP 2106.05(g)(3)).
Claim 7 recites “wherein a suggestion is output via a file that displays which course of action should be taken along with what the agent took into consideration”; this amounts to insignificant extra-solution activity, specifically “necessary data gathering and outputting” (see MPEP 2106.05(g)(3)).
 Claim 8 recites “wherein the file is displayed onto a graphical user interface”; this amounts to insignificant extra-solution activity, specifically “necessary data gathering and outputting” (see MPEP 2106.05(g)(3)).
Claim 9 recites “wherein the system is embedded into a space based asset”; this amounts to merely generally linking the use of a judicial exception to a particular technological environment (see MPEP 2106.05(h)). 
Claim 10 recites “wherein the space based asset is a satellite”; this amounts to merely generally linking the use of a judicial exception to a particular technological environment (see MPEP 2106.05(h)).
Remarks - 35 USC § 101
Examiner notes that Claims 11-14 are not rejected under 35 USC 101 because they explicitly recite the process of training (“training a reinforcement machine learning agent using data on the space based asset and a plurality of threats”) using a simulation (“processing the policy and the value of action with a simulator”).  However, Claim 1 recites the pre-existence of a simulator-trained machine learning model, in which case making use of this existing model amounts to merely linking a mental process of performing some evaluations of data, to a technological environment (an RL model trained on a physics simulator).  Examiner notes that the 101 Rejections on Claims 1-10 can be overcome by simply positively reciting the training process, similarly to Claim 11 (see MPEP 2106.04(a)(1)(vii), which states that a method of training a machine learning model is not a mental process).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-6, and 9-14 are rejected under 35 U.S.C. 103 as being unpatentable over Broida et al. (“Spacecraft Rendezvous Guidance in Cluttered Environments Via Reinforcement Learning”; hereinafter “Broida”) in view of Shalev-Shwartz et al. (US 2018/0032082 A1; hereinafter “Shalev-Shwartz”).
As per Claim 1, Broida teaches a method of threat mitigation for space-based assets, comprising: (Broida, Page 1 Abstract, discloses:  “After sufficient training, we evaluate the performance of PPO developed policies in a simulated satellite rendezvous environment including a keep-out zone to protect against collisions.”)
providing a reinforcement machine learning agent trained from a physics based space simulation (Broida, Page 1 Abstract, discloses:  “This paper investigates the use of Reinforcement Learning for closed-loop control applied to satellite rendezvous missions.”  Broida, Page 2 Top, discloses:  “Reinforcement Learning is a general class of machine learning techniques that use repeated simulations of an environment in conjunction with a reward function to develop an implicit environmental model.”  Here, Broida discloses that the RL model is trained using a simulation.  Broida, Page 4 Bottom, discloses:

    PNG
    media_image1.png
    404
    565
    media_image1.png
    Greyscale

Here, Broida discloses that the simulation is physics based, as equations describing motion through space are what the “utilize for our simulation”.  Broida, Page 7 Last Sentence, discloses:  “The simulation was run for 200,000 episodes before we considered the training to be complete.”)
wherein the reinforcement machine learning agent is configured to: 
process environmental information including data about one or more space-based assets and one or more threats; (Broida, Page 2 Bottom, discloses:  “This work considers the problem of moving one spacecraft into a docked position with another spacecraft. In this section we describe the RL set-up that is employed to find a 3-DOF closed-loop policy that generates collision free trajectories capable of a) driving the docking satellite to the desired location in a relative orbit frame with pinpoint accuracy and b) capable of doing so in an efficient manner without colliding with the docking target.”  Here, Broida discloses processing environmental information including data about one or more space-based assets (“spacecraft”) and one or more threats (“colliding with the docking target”)).
However, Broida does not explicitly teach receive warnings from one or more sensors, wherein the warnings require a course of action.  Broida does appear to suggest that sensors are being used, as Broida Page 2 Para 2 discloses:  “Like the satellite docking problem, autonomous car navigation involves complexities such as keep-out zones, real-time error handling, and collision avoidance. Although the dynamics for these two problems differ greatly, the same process of mapping sensor data to robust navigational commands would be ideal for a satellite moving through space.  Broida also states on Page 11 Conclusion:  “Additionally, we would like to bolster the simulation with more realistic thruster models and sensor systems.  Therefore, it appears to Examiner that, despite not explicitly mentioning how sensors are used in the Problem Formulation portion of the paper, Broida is making use of sensors, as implied above by use of the terms “same” and “more realistic”.  However, for the sake of clarity of the record, Examiner will bring in a secondary reference to explicitly teach sensors.
Broida also does not explicitly teach and providing a suggestion to an analyst for which course of action to follow to mitigate the one or more threats against the one or more space-based assets, as Broida’s system appears to be fully automated, as Broida Page 1 Introduction discloses:  “We would like to develop an autonomous navigation system capable of handling any combination of these difficulties”).
Shalev-Shwartz teaches receive warnings from one or more sensors, wherein the warnings require a course of action (Shalev-Shwartz, Para [0119], discloses:  “As discussed below in further detail and consistent with various disclosed embodiments, system 100 may provide a variety of features related to autonomous driving and/or driver assist technology. For example, system 100 may analyze image data, position data (e.g., GPS location information), map data, speed data, and/or data from sensors included in vehicle 200. System 100 may collect the data for analysis from, for example, image acquisition unit 120, position sensor 130, and other sensors. Further, system 100 may analyze the collected data to determine whether or not vehicle 200 should take a certain action, and then automatically take the determined action without human intervention. For example, when vehicle 200 navigates without human intervention, system 100 may automatically control the braking, acceleration, and/or steering of vehicle 200 (e.g., by sending control signals to one or more of throttling system 220, braking system 230, and steering system 240). Further, system 100 may analyze the collected data and issue warnings and/or alerts to vehicle occupants based on the analysis of the collected data. Additional details regarding the various embodiments that are provided by system 100 are provided below.”  Here, Shalev-Shwartz discloses receive warnings from sensors (“data from sensors”) wherein the warnings require a course of action (“analyze the collected data to determine whether or not vehicle 200 should take a certain action”)).
and providing a suggestion to an analyst for which course of action to follow to mitigate the one or more threats against the one or more [space-based] assets (Recall above that Broida discloses space-based assets.  Shalev-Shwartz, Para [0119] shown above, discloses:  “Further, system 100 may analyze the collected data and issue warnings and/or alerts to vehicle occupants based on the analysis of the collected data”)
Broida and Shalev-Shwartz are analogous art because Broida is directed to using reinforcement learning to control space-based assets, trained using a simulator, and Shalev-Shwartz discloses using reinforcement learning to control a car, trained using a simulator (see Shalev-Shwartz [0189]:  “Another approach for training driving policy module 803 may include decomposing the driving policy function into semantically meaningful components. This allows implementation of parts of the policy manually, which may ensure the safety of the policy, and implementation of other parts of the policy using reinforcement learning techniques, which may enable adaptivity to many scenarios, a human-like balance between defensive/aggressive behavior, and a human-like negotiation with other drivers. From the technical perspective, a reinforcement learning approach may combine several methodologies and offer a tractable training procedure, where most of the training can be performed using either recorded data or a self-constructed simulator.  Broida themselves point out the applicability of the autonomous car to the satellite problem on Page 2 Para 2:  “Like the satellite docking problem, autonomous car navigation involves complexities such as keep-out zones, real-time error handling, and collision avoidance. Although the dynamics for these two problems differ greatly, the same process of mapping sensor data to robust navigational commands would be ideal for a satellite moving through space.”
It would have been obvious before the effective filing date of the claimed invention to combine the RL-based satellite control of Broida with the sensors and action suggestions to the operator of the asset of Shalev-Shwartz.  One of ordinary skill in the art would be motivated to do so in order to better ensure safety of the asset, as human judgment may be more advantageous to make the final decision or maneuver (Shalev-Shwartz, [0154]:  “Thus, based on the confidence level, processing unit 110 may delegate control to the driver of vehicle 200 in order to improve safety conditions.”)

As per Claim 3, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Broida teaches wherein processing environmental information includes assessing sample courses of action in sample situations. (Broida, Page 6 Top, discloses:  “We can see that if the critic provides an accurate estimate of the true value Vπθ , then the advantage will designate which actions provide more or less reward than the current policy generates”.  Recall above that Broida discloses a simulation (“sample situations”) and here Broida discloses assessing sample courses of action (“which actions provide more or less reward”)).

As per Claim 4, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Broida teaches wherein reinforcement machine learning comprises a reward function to push the system to learn ideal responses to various situations. (Broida, Page 5 Para 2, discloses:  “The randomized policy is then sampled within a simulation. At each timestep t, the simulation outputs an observation xt and a reward rt(xt), the former of which is fed into the neural network to determine the next control command ut.)

As per Claim 5, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Broida teaches wherein reinforcement machine learning comprises a loss function to calculate how well the system estimates a situation and the ideal action to take. (Broida, Page 5 Penultimate Paragraph:  “In addition to the neural network driving the policy (the actor), Actor-Critic methods have a second network which is intended to critique the policy (the critic). It is the critic’s job to take in the observation at timestep t and output the value function of the actor, Vπθ (xt). Gradients of the critic are estimated by minimizing the loss function”).

As per Claim 6, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Shalev-Shwartz teaches wherein environmental information is input via a text file, graphical user interface, or direct connection to sensors that output the environmental information. (Shalev-Shwartz, Para [0119], discloses direct connection to sensors:  “As discussed below in further detail and consistent with various disclosed embodiments, system 100 may provide a variety of features related to autonomous driving and/or driver assist technology. For example, system 100 may analyze image data, position data (e.g., GPS location information), map data, speed data, and/or data from sensors included in vehicle 200.”)

As per Claim 9, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Broida teaches a space based asset (Broida, Page 1 Abstract, discloses:  “This paper investigates the use of Reinforcement Learning for closed-loop control applied to satellite rendezvous missions.”)
However, Broida does not explicitly teach that the system is embedded into a space based asset.
Shalev-Shwartz teaches the system is embedded into a [space based] asset.  (Shalev-Shwartz, Para [0081], discloses:  “In some embodiments, system 100 may be included on a vehicle 200, as shown in FIG. 2A. For example, vehicle 200 may be equipped with a processing unit 110 and any of the other components of system 100, as described above relative to FIG. 1.”)
Broida and Shalev-Shwartz are analogous art for the reasons recited in Claim 1.  It would have been obvious before the effective filing date of the claimed invention to combine the RL-based satellite control of Broida with the inclusion of the system onboard of Shalev-Shwartz.  One of ordinary skill in the art would be motivated to do so in order to more quickly get safety-critical information to the operator of the asset (Shalev-Shwartz, Para [0151]: “At step 558, processing unit 110 may consider additional sources of information to further develop a safety model for vehicle 200 in the context of its surroundings. Processing unit 110 may use the safety model to define a context in which system 100 may execute autonomous control of vehicle 200 in a safe manner.”)


As per Claim 10, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Broida teaches wherein the space based asset is a satellite (Broida, Page 1 Abstract, discloses:  “This paper investigates the use of Reinforcement Learning for closed-loop control applied to satellite rendezvous missions.”)

As per Claim 11, Broida teaches operations for mitigating threats to the space based asset, the operations comprising (Broida, Page 1 Abstract, discloses:  “After sufficient training, we evaluate the performance of PPO developed policies in a simulated satellite rendezvous environment including a keep-out zone to protect against collisions.”)
training a reinforcement machine learning agent using data on the space based asset and a plurality of threats; (Broida, Page 1 Abstract, discloses:  “After sufficient training, we evaluate the performance of PPO developed policies in a simulated satellite rendezvous environment including a keep-out zone to protect against collisions.”  Broida, Page 2 Top, discloses:  “Reinforcement Learning is a general class of machine learning techniques that use repeated simulations of an environment in conjunction with a reward function to develop an implicit environmental model.”)
computing a policy and a value of action at a given state on the reinforcement machine learning agent; (Broida, Page 2 Top, discloses:  “In this paper, we will be applying the Proximal Policy Optimization (PPO) version of RL. PPO is an on-policy gradient descent algorithm. It works by creating two neural network, one that parametrizes the agents policy and one that parametrizes a value function for the current policy (the critic). As the agent explores the state space, the critic evaluates actions in the context of the policy’s current expected performance level (the advantage)).”
processing the policy and the value of action with a simulator, wherein the simulator sends back new state information to the reinforcement machine learning agent which computes new policy and new value of action; (Broida, Page 5 Para 2, discloses:  “The randomized policy is then sampled within a simulation. At each timestep t, the simulation outputs an observation xt and a reward rt(xt), the former of which is fed into the neural network to determine the next control command ut. The control is then sent back to the simulation to determine the next state.”)
making a decision by the reinforcement machine learning agent and matching to a course of action; (Broida, Page 9 Bottom:  “In this way, we chose the action that the policy determines to be the most likely to generate the maximum reward”)
However, Broida does not explicitly teach A computer program product including one or more non-transitory machine-readable mediums having instructions encoded thereon that, when executed by one or more processors on board a [space based] asset, resulting in operations (Recall above Broida discloses a space based asset.  Shalev-Shwartz, Para [0024], discloses:  “Consistent with other disclosed embodiments, non-transitory computer-readable storage media may store program instructions, which are executed by at least one processing device and perform any of the methods described herein.”)
providing the course of action for execution. (Shalev-Shwartz, Para [0119], discloses:  “Further, system 100 may analyze the collected data and issue warnings and/or alerts to vehicle occupants based on the analysis of the collected data”)
Broida and Shalev-Shwartz are analogous art because Broida is directed to using reinforcement learning to control space-based assets, trained using a simulator, and Shalev-Shwartz discloses using reinforcement learning to control a car, trained using a simulator (see Shalev-Shwartz [0189]:  “Another approach for training driving policy module 803 may include decomposing the driving policy function into semantically meaningful components. This allows implementation of parts of the policy manually, which may ensure the safety of the policy, and implementation of other parts of the policy using reinforcement learning techniques, which may enable adaptivity to many scenarios, a human-like balance between defensive/aggressive behavior, and a human-like negotiation with other drivers. From the technical perspective, a reinforcement learning approach may combine several methodologies and offer a tractable training procedure, where most of the training can be performed using either recorded data or a self-constructed simulator.  Broida themselves point out the applicability of the autonomous car to the satellite problem on Page 2 Para 2:  “Like the satellite docking problem, autonomous car navigation involves complexities such as keep-out zones, real-time error handling, and collision avoidance. Although the dynamics for these two problems differ greatly, the same process of mapping sensor data to robust navigational commands would be ideal for a satellite moving through space.”
It would have been obvious before the effective filing date of the claimed invention to combine the RL-based satellite control of Broida with the sensors and action suggestions to the operator of the asset of Shalev-Shwartz.  One of ordinary skill in the art would be motivated to do so in order to better ensure safety of the asset, as human judgment may be more advantageous to make the final decision or maneuver (Shalev-Shwartz, [0154]:  “Thus, based on the confidence level, processing unit 110 may delegate control to the driver of vehicle 200 in order to improve safety conditions.”)

As per Claim 12, the combination of Broida and Shalev-Shwartz teaches the computer program product according to claim 11.  Shalev-Shwartz teaches further comprising post-processing the course of action. (Shalev-Shwartz, Para [0119], discloses:  “Further, system 100 may analyze the collected data and issue warnings and/or alerts to vehicle occupants based on the analysis of the collected data”.  As the alerts are issues to humans, then there must have been some post-processing to put the actions in a human understandable format, similar to Instant Specification [0075]:  “post-processing is done to convert numerical courses of action generated via the agent in the simulation into human readable course of action.” Furthermore, Shalev-Shwartz discloses converting the information to audio format in [0116]:  “For example, system 100 may provide various notifications (e.g., alerts) via speakers 360.”)

As per Claim 13, the combination of Broida and Shalev-Shwartz teaches the computer program product according to claim 11.  Shalev-Shwartz teaches wherein providing the course of action is providing the course of action to an operator. (Shalev-Shwartz, Para [0119], discloses:  “Further, system 100 may analyze the collected data and issue warnings and/or alerts to vehicle occupants based on the analysis of the collected data”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Broida and Shalev-Shwartz for at least the reasons recited in Claim 11.

As per Claim 14, the combination of Broida and Shalev-Shwartz teaches the computer program product according to claim 11. Broida teaches wherein making the decision is done at an end of a simulation. (Broida, Page 9 Bottom, discloses:  “We evaluate the trained networks by deploying them in the same simulation we used to train them. However, rather than determine the agent’s action by sampling the normal distribution output by the policy network, we take the mean of the distribution instead. In this way, we chose the action that the policy determines to be the most likely to generate the maximum reward.”  Here, Broida discloses choosing an action after the simulation has been run to train the model.)

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Broida and Shalev-Shwartz, further in view of  White et al. (“Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains”; hereinafter “White”).
As per Claim 2, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  Broida teaches wherein the system provides a plurality of suggested courses of action (Broida, Page 6 Top, discloses:  “We can see that if the critic provides an accurate estimate of the true value Vπθ , then the advantage will designate which actions provide more or less reward than the current policy generates”.  Here Broida discloses a plurality of suggested courses of action (“which actions provide more or less reward”)).
However, Broida does not explicitly teach along with respective confidence intervals.
White teaches along with respective confidence intervals. (White, Page 1 Intro Para 1, discloses:  “Many reinforcement learning algorithms estimate values for states to enable selection of maximally rewarding actions. Obtaining confidence intervals on these estimates has been shown to be useful in practice, including directing exploration.”  White, Page 1 Intro Para 2, discloses:  “In this work we focus on constructing confidence intervals for online model-free reinforcement learning agents.”)
White and the combination of Broida and Shalev-Shwartz are analogous art because they are both in the field of endeavor of reinforcement learning.
It would have been obvious before the effective filing date of the claimed invention to combine the RL-based satellite guidance of Broida and Shalev-Shwartz with the confidence intervals of White.  One of ordinary skill in the art would be motivated to do so in order to better find optimal actions by balancing exploitation of known actions with exploration of new actions, and better adapting to a non-stationary environment of space which is constantly changing (White, Page 1 Intro Para 1:  “Obtaining confidence intervals on these estimates has been shown to be useful in practice, including directing exploration [17, 19] and deciding when to exploit learned models of the environment [3]. Moreover, there are several potential applications using confidence estimates, such as teaching interactive agents (using confidence estimates as feedback), adjusting behaviour in non-stationary environments and controlling behaviour in a parallel multi-task reinforcement learning setting.”

Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Broida and Shalev-Shwartz, further in view of Schwalb (US 10,816,978 B1).
As per Claim 7, the combination of Broida and Shalev-Shwartz teaches the method according to claim 1.  However, the combination of Broida and Shalev-Shwartz does not explicitly teach wherein a suggestion is output via a file that displays which course of action should be taken along with what the agent took into consideration.
Schwalb teaches wherein a suggestion is output via a file that displays which course of action should be taken along with what the agent took into consideration. (Schwalb, Col 15 Lines 15-26, discloses:  “In some arrangements, the pixelated image (e.g., the pixilated dashboards 220 and 300) can be displayed to an operator operating the simulation system 100. The layout of the pixelated image is sufficiently intuitive and user-friendly such that the operator can immediately understand the basis for the actuator commands generated by the AI driver. In some arrangements, each pixelated image (e.g., each frame) can be stored in a suitable database with corresponding actuator commands to allow later review of the simulated environment as portrayed in each pixelated image and the corresponding actuator commands without having to rerun the simulation”.  Here, Schwalb discloses that the suggestion, in the form of a pixelated image, can be “stored in a suitable database”, and is this stored in a “file”.  This is output displays which course of action should be taken along with what the agent took into consideration (“the operator can immediately understand the basis for the actuator commands generated by the AI driver”)).
Schwalb and the combination of Broida and Shalev-Shwartz are analogous art because they are both directed to using reinforcement learning to control assets (Schwalb, Col 4 Lines 14-15, discloses:  “Some implementations use closed-loop reinforcement learning for performing simulations and improving AI drivers.”)
It would have been obvious before the effective filing date of the claimed invention to combine the RL-based satellite control of Broida and Shalev-Shwartz with the output file and display of Schwalb.  One of ordinary skill in the art would be motivated to do so in order to better ensure safety of the asset, by allowing the human operator to take into consideration what led to the suggested decision, as well as allowing later review of the decision to help improve the efficiency of the system (Schwalb, Col 15 Lines 17-27:  “The layout of the pixelated image is sufficiently intuitive and user-friendly such that the operator can immediately understand the basis for the actuator commands generated by the AI driver.  In some arrangements, each pixelated image (e.g., each frame) can be stored in a suitable database with corresponding actuator commands to allow later review of the simulated environment as portrayed in each pixelated image and the corresponding actuator commands without having to rerun the simulation. Accordingly, computational efficiency can be improved.”)

As per Claim 8, the combination of Broida, Shalev-Shwartz, and Schwalb teaches the method according to claim 7.  Schwalb teaches wherein the file is displayed onto a graphical user interface (Schwalb, Col 15 Lines 15-17, discloses:  “In some arrangements, the pixelated image (e.g., the pixilated dashboards 220 and 300) can be displayed to an operator operating the simulation system 100.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Broida and Shalev-Shwartz with Schwalb for at least the reasons recited in Claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Shirvani et al. (US 2019/0235515 A1) discloses in [0156] using Reinforcement Learning (RL) to identify unsafe driving practices.  In [0202] they discloses an alert and a manual override control to the human driver
Xu et al. (“Model-based deep reinforcement learning with heuristic search for satellite attitude control”) discloses using RL to control a satellite, that can be initially trained offline in a simulation
Ma et al. (“Reinforcement Learning-Based Satellite Attitude Stabilization Method for Non-Cooperative Target Capturing”) discloses using RL to control a satellite, and training on a simulation
Rodrigues-Ramos et al. (“A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform”) discloses using RL to control a UAV, using training in a simulation
Li et al. (“A Dyna-Q-Based Solution for UAV Networks Against Smart Jamming Attacks”) discloses using RL to control a UAV to defend against jamming attacks
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./             Examiner, Art Unit 2126
/ANN J LO/             Supervisory Patent Examiner, Art Unit 2126