DETAILED ACTION
Claims 1-20 are presented.
IDS are considered.
Drawings as originally filed are accepted.
Allowable Subject Matter
Claims 1-20 are allowed.
REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance: 
The claims are allowed in view of searches conducted, evaluation of the claims, and references of record. Specifically:
References considered relevant to the claimed subject matter include:
Vecerik et al. (US 2020/0104684) - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network. In one aspect, a method comprises: obtaining an expert observation; processing the expert observation using a generative neural network system to generate a given observation-given action pair, wherein the generative neural network system has been trained to be more likely to generate a particular observation-particular action pair if performing the particular action in response to the particular observation is more likely to result in the environment later reaching the state characterized by a target observation; processing the given observation using the action selection policy neural network to generate a given action score for the given action; and adjusting the current values of the action selection policy neural network parameters to increase the given action score for the given action.
Budden et al. (US 2020/0293883) - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network that is used to select actions to be performed by a reinforcement learning agent interacting with an environment. In particular, the actions are selected from a continuous action space and the system trains the action selection neural network jointly with a distribution Q network that is used to update the parameters of the action selection neural network.
Budden et al. (US 2020/0265305) - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. One of the systems includes (i) a plurality of actor computing units, in which each of the actor computing units is configured to maintain a respective replica of the action selection neural network and to perform a plurality of actor operations, and (ii) one or more learner computing units, in which each of the one or more learner computing units is configured to perform a plurality of learner operations
The references discloses subject matters relevant to the topics of analysis and examination of AI performance when transferred to real world environment. Nevertheless, they do not disclose a system, method and CRM as claimed per record. For example:
obtaining policy data specifying a control policy for controlling a source agent interacting with a source environment to perform a particular task, wherein the policy data comprises data specifying a trained Q neural network, and wherein the trained Q neural network has been trained to receive a network input comprising an observation characterizing a state of the source environment and an action from a set of possible actions that can be performed by the agent and to generate a Q value that represents a return that would be received by the source agent if the action was performed by the source agent in response to the observation; obtaining a validation data set generated from interactions of a target agent in a target environment, the validation data set comprising a plurality of trajectories, wherein: each trajectory comprises a respective plurality of observation - action pairs, each observation - action pair includes an observation and an action performed by the target agent in response to the observation, the plurality of trajectories comprises a plurality of positive-reward trajectories and a plurality of negative-reward trajectories, each positive-reward trajectory is a trajectory of actions in which the particular task was successfully completed by the target agent, and each negative-reward trajectory is a trajectory of actions in which the particular task was not successfully completed by the target agent; processing each observation - action pair in each of the trajectories using the trained Q neural network to generate a respective Q value for each of the observation - action pairs; determining a positive aggregate Q value from the Q values for the observation - action pairs in the positive-reward trajectories; determining a second aggregate Q value from the Q values for the observation - action pairs in a second subset of the plurality of trajectories that includes at least the negative-reward trajectories; determining, from the positive aggregate Q value and the second aggregate Q value, a performance estimate that represents an estimate of a performance of the control policy in controlling the target agent to perform the particular task in the target environment; and  17Attorney Docket No. 16113-8807001 determining, based on the performance estimate, whether to deploy the control policy for controlling the target agent to perform the particular task in the target environment.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUAN M HUA whose telephone number is (571)270-7232. The examiner can normally be reached 10:30-6:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on 571-272-7795. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/QUAN M HUA/            Primary Examiner, Art Unit 2645