DETAILED ACTION
Claims 1-20 are pending in this action.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 10/12/2020, 11/10/2020, 6/21/2021 and 8/10/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements have been considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-5, 12, 14-17 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Nakada (WO-2018123606-A1).

As per claim 1, Nakada teaches a computing system comprising: a computation engine comprising processing circuitry (Page 2, para. 8, PC has processor and computation function), wherein the computation engine is configured to obtain interaction data generated by a reinforcement learning agent (Page 2, para. 10-11, receiving reinforcement data from an agent), the interaction data characterizing one or more tasks in an environment (Page 3, para. 9, interaction/tasks are that the agent can move in an area) and characterizing one or more interactions of the reinforcement learning agent with the environment (Page 3, para. 3, movement route is the agent’s movement through the environment) see also (Page 3, para. 6, movement route is a collection of movement actions of agent), the one or more interactions performed according to trained policies for the reinforcement learning agent (Page 3, para. 3, movement route based on movement policy which is reinforced and corrected by the system see Page 3, para. 2), wherein the computation engine is configured to process the interaction data to apply a first analysis function to the one or more tasks to generate first elements (Page 3, para. 4, process supplied movement route using reverse reinforcement learning) also (Page 3, para. 6, movement tasks are used to generate position and probability elements, wherein the computation engine is configured to process the interaction data to apply a second analysis function to the one or more interactions to generate second elements (Page 3, para. 4, process supplied movement route with a reward basis function) also (Page 3, para. 6, movement tasks are used to generate position and probability elements), the first analysis function different than the second analysis function (Page 4, para. 4, two different learning model/function, position, probability, etc.), wherein the computation engine is configured to process at least one of the first elements and the second elements to generate third elements denoting one or more characteristics of the one or more interactions (Page 3, para. 9, processing probability, position, reward, route to generate reward values, probability distribution, corrected models), and wherein the computation engine is configured to output an indication of the third elements to a user to provide an explanation of the one or more interactions of the reinforcement learning agent with the environment (Page 3, para. 9, displaying probability distribution, model, corrected routes) see also (Page 3, para. 2, Page 4, para. 3).

As per claim 2, Nakada teaches the computing system of claim 1, wherein the computation engine is configured to output a request for a decision for one or more actions to perform by the reinforcement learning agent within the environment (Page 3, para. 3, accepting a user input of a movement route for the movement agent) (Examiner Note: “accepting is interpreted to be a form of request see also Page 5, para. 11, accepting is described as a prompting and waiting for a particular input from the user), wherein the computation engine is configured to receive, from the user, decision data indicating a decision of the user responsive to the request for the decision (Page 3, para. 3, receiving the movement route), and wherein the computation engine is configured to process the decision data to modify the trained policies for the reinforcement learning agent (Page 3, para. 7, optimizing movement policy), retrain the reinforcement learning agent (Page 3, para. 7, agent will now follow new movement policy and reward function), or provide control to the reinforcement learning agent (Page 3, para. 6, agent gets to move at some point after optimization/correction).

As per claim 3, Nakada teaches the computing system of claim 1, wherein the computation engine is configured to execute the reinforcement learning agent to perform the one or more tasks to generate the trained policies (Page 3, para. 6 and 7, by performing moves of movement routes, agent facilitates new optimized movement policies).

As per claim 4, Nakada teaches the computing system of claim 1, wherein the first analysis function comprises a transition analysis function (Page 3, para. 3, movement analysis of agent positions and moves see also Page 3, para. 6), wherein to generate the first elements, the computation engine applies the transition analysis function to the one or more interactions to identify a reinforcement learning agent transition having a certainty level that meets a threshold (Page 7, para. 9, determining a optimized movement or route that has a distance difference larger than a threshold), and wherein the first elements comprise an indication of the identified reinforcement learning agent transition (Page 3, para. 6, first elements can be agent moves or movement routes see also Page 3, para. 3).

As per claim 5, Nakada teaches the computing system of claim 1, wherein the first analysis function comprises a reward analysis function (Page 11, para. 7, reward basis function), wherein to generate the first elements, the computation engine applies the reward analysis function to rewards of the one or more interactions to identify an interaction having a reward value that meets a distribution threshold (Page 11, para. 7, comparing the current reward value with reward value distribution for meeting a threshold), and wherein the first elements comprise an indication of the identified interaction (Page 3, para. 6, first elements can be agent moves or movement routes see also Page 3, para. 3).

As per claim 12, Nakada teaches the computing system of claim 1, wherein the computation engine is configured to generate, based on the third elements, one or more training scenarios (Page 3, para. 1, creating new optimized reinforcement learning models used for new trainings).

As per claim 14, Nakada teaches the computing system of claim 1, wherein the computation engine is configured to receive a query for a most likely sequence for the reinforcement learning agent, wherein the third elements comprise the most likely sequence (Page 4, para. 3, probability density distribution is used to predict the movement route from each grid position reaching goal, i.e. most likely route to goal).

As per claim 15, the substance of the claimed invention is identical or substantially similar to that of claim 1. Accordingly, this claim is rejected under the same rationale.

As per claim 16, Nakada teaches the method of claim 15, wherein the first analysis comprises one of a transition analysis function or a reward analysis function (Page 3, para. 4, movement policy analysis and reward basis function both applied).

As per claim 17, Nakada teaches the method of claim 15, wherein the second analysis function comprises an interaction analysis function (Page 12, para. 7-9, agent can exist in VR where it interacts with the environment and the movement policy is analyzed).

As per claim 20, the substance of the claimed invention is identical or substantially similar to that of claim 1. Accordingly, this claim is rejected under the same rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 6, 7, 11, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nakada in view of Baughman et al. (US PGPUB No. 2017/0318035 A1) [hereinafter “Baughman”].

As per claim 6, Nakada teaches the computing system of claim 1.
Nakada does not explicitly teach wherein the second analysis function comprises an interaction analysis function, wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data. Baughman teaches wherein the second analysis function comprises an interaction analysis function, wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data ([0139], observing feature frequency or count, abnormal behavior and/or confidence analysis).
	At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, wherein the second analysis function comprises an interaction analysis function, wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

As per claim 7, Nakada teaches the computing system of claim 1, wherein the first analysis function is a function included in an environmental analysis level of a multi-level introspection framework (Page 3, para. 4, multiple reinforcement models are used to analyze and optimize the movement strategy/policy and route thru the environment see also Page 3, para. 2), wherein the second analysis function is a function included in an interaction analysis level of the multi-level introspection framework (Page 3, para. 4, reward and movement analysis both considered “interaction” analysis).
Nakada does not explicitly teach wherein to process the first elements and the second elements the computation engine is configured to apply a meta-analysis function of a meta-analysis level of the multi-level introspection framework. Baughman teaches wherein to process the first elements and the second elements the computation engine is configured to apply a meta-analysis function of a meta-analysis level of the multi-level introspection framework ([0137] and [0139], analyzing metadata to along with confidence values to make classification and error correction to learning models).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, wherein to process the first elements and the second elements the computation engine is configured to apply a meta-analysis function of a meta-analysis level of the multi-level introspection framework, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

As per claim 11, Nakada teaches the computing system of claim 1, wherein to output the indication of the third elements the computation engine is configured to: compute, based on the third elements, summary data for a plurality of analysis functions (Page 4, para. 8, environmental map with contour lines and policy data superimposed and displayed to user for viewing and decision making, i.e. summary), the summary data comprising one or more of: a maxima state (Page 4, para. 3, route has a starting position and a goal position each can be viewed as a maximum state or minimum state), a minima state (Page 4, para. 3, route has a starting position and a goal position each can be viewed as a maximum state or minimum state), a state-action pair with associated certainty (Page 4, para., 3, probability density distribution are the begin state and end/goal state on the grid with a probability), a most likely sequence from a minima state to a maxima state (Page 4, para. 3, the probability density distribution includes predicts the movement routes for each grid position, i.e. most likely sequences – as stated begin state on grid or end/goal state can be switched as a maximum or minimum state), or a most likely sequence from a maxima state to a minima state see id.; and output, to a display device, the summary data (Page 4, para. 3-6, Contour lines of probability and other “summarized” data superimposed and displayed to user).
Nakada does not explicitly teach a state with associated frequency value. Baughman teaches a state with associated frequency value ([0128] and [0139],tracking counts of analyzed features which can include any co-related data to DNS traffic like states).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, a state with associated frequency value, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
As per claim 18, Nakada teaches the method of claim 15, and a value analysis function (Page 3, para. 3, analyzing the user input of values for movement routes).
Nakada does not explicitly teach wherein the second analysis function comprise one of an observation frequency analysis function, an observation-action frequency analysis function, or a value analysis function, and wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data. Baughman teaches wherein the second analysis function comprise one of an observation frequency analysis function or an observation-action frequency analysis function ([0139], using feature count or frequency by the deep learning system for classification and error correction functionality – this includes activity analysis), and wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data ([0139], observing feature frequency or count, abnormal behavior and/or confidence analysis).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, wherein the second analysis function comprise one of an observation frequency analysis function, an observation-action frequency analysis function, or a value analysis function, and wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

As per claim 19, Nakada teaches the method of claim 15.
Nakada does not explicitly teach wherein processing the first elements and the second elements comprises applying a meta-analysis function. Baughman teaches wherein processing the first elements and the second elements comprises applying a meta-analysis function ([0139], using metadata like feature count or occurrence to analyze classification/error correction functionality).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, wherein processing the first elements and the second elements comprises applying a meta-analysis function, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Nakada in view of Zhang et al. (CN-109242041-A) [hereinafter “Zhang”].

As per claim 8, Nakada teaches the computing system of claim 1, wherein the first analysis function comprises a value function, wherein the first elements comprise respective values indicating expected respective rewards for one or more state of the environment (Page 11, para. 7, reward basis function used to calculate reward values for various movements/routes), wherein the second analysis function comprises a transition probability function, wherein the second elements comprise transition probability values each indicating a probability of a transition to a new state of the environment given a state of the environment and an action (Page 4, para. 3, calculating probability of agent reaching goal based on agent movement form current position/state on grid) and the values and the transition probability values (Page 4, para. 3, movement values and probability of movement).
Nakada does not explicitly teach the computation engine is configured to compute at least one of local minima or maxima, absolute minima or maxima, observation variance outliers, or strict-difference variance outliers. Zhang teaches the computation engine is configured to compute at least one of local minima or maxima, absolute minima or maxima, observation variance outliers, or strict-difference variance outliers (Page 5, para. 3, determining max and min of each data sequence, i.e. local data, abnormal data, i.e. variance outlier, which includes marked difference data).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Zhang, the computation engine is configured to compute at least one of local minima or maxima, absolute minima or maxima, observation variance outliers, or strict-difference variance outliers, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Nakada in view of Baughman in further view of Zhang.

As per claim 9, Nakada teaches the computing system of claim 1, wherein the first analysis function comprises a value function, wherein the first elements comprise respective values indicating expected respective rewards for one or more state of the environment (Page 11, para. 7, reward basis function used to calculate reward values for various movements/routes), wherein the second analysis function comprises a transition probability function, wherein the second elements comprise transition probability values each indicating a probability of a transition to a new state of the environment given a state of the environment and an action (Page 4, para. 3, calculating probability of agent reaching goal based on agent movement form current position/state on grid) and generate a transition graph based on the transition probability values (Page 4, para. 3, movement values and probability of movement); and process the transition graph to identify most likely sequences of transitions of the reinforcement learning agent within the environment, wherein the third elements comprise the most likely sequences (Page 4, para. 3, the probability density distribution includes predicts the movement routes for each grid position, i.e. most likely sequences).
Nakada does not explicitly teach wherein the interaction data comprises counter data indicating respective numbers for at least: one or more states of the environment interacted with by the reinforcement learning agent, one or more actions performed for states of the environment, or one or more transitions of the reinforcement learning agent within the environment. Baughman teaches wherein the interaction data comprises counter data indicating respective numbers for at least: one or more states of the environment interacted with by the reinforcement learning agent ([0139], tracking counts/occurrences of features) see also ([0127], features include analyzed attribute, parameter or cross-corelated data – this include states), one or more actions performed for states of the environment ([0127], this includes inputs to the traffic, i.e. action), or one or more transitions of the reinforcement learning agent within the environment ([0131], movement of the traffic is followed, classification transitions from benign/malicious, adaptations are made, i.e. also a form of transitions).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, wherein the interaction data comprises counter data indicating respective numbers for at least: one or more states of the environment interacted with by the reinforcement learning agent, one or more actions performed for states of the environment, or one or more transitions of the reinforcement learning agent within the environment, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
The combination of Nakada and Baughman does not explicitly teach wherein to process the first elements and the second elements the computation engine is configured to: compute local maxima based on the values. Zhang teaches wherein to process the first elements and the second elements the computation engine is configured to: compute local maxima based on the values (Page 5, para. 3, calculating maximum value for a particular sequence, i.e. a local maximum).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada and Baughman with the teachings of Zhang, wherein to process the first elements and the second elements the computation engine is configured to: compute local maxima based on the values, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

As per claim 10, Nakada teaches the computing system of claim 1, wherein the first analysis function comprises a reward analysis function, wherein to generate the first elements (Page 11, para. 7, reward basis function), the computation engine applies the reward analysis function to rewards of the one or more interactions to identify an interaction having a reward value that meets a distribution threshold (Page 11, para. 7, finding the reward value of an action that meets the distribution threshold), and wherein the first elements comprise an indication of the identified interaction (Page 11, para. 8, reward value linked with agent action/s), wherein the second analysis function comprises a value analysis function, wherein the interaction data comprises at least one: value data for one or more actions performed for states of the environment (Page 4, para. 3, probability for each route on the grid positions in the environment), or prediction error for one or more actions performed for the one or more states of the environment (Page 4, para. 3, probability is a prediction of error/success – for movement route to reaching goal), and wherein to generate the second elements, the computation engine applies the value analysis function to at least one of the value data or prediction error (Page 4, para. 3, applying movement routes to probability of movement for success).
Nakada does not explicitly teach wherein the interaction data comprises counter data indicating respective numbers for at least: one or more states of the environment interacted with by the reinforcement learning agent, one or more actions performed for states of the environment, or one or more transitions of the reinforcement learning agent within the environment. Baughman teaches wherein the interaction data comprises counter data ([0139], count or occurrence of a feature) indicating respective numbers for at least: one or more states of the environment interacted with by the reinforcement learning agent ([0126] a feature includes a state of the environment since it is formulated from attributes, parameters and any correlated DNS data), one or more actions performed for states of the environment ([0115], DNS transactions are actions performed to alter the state of the environment), or one or more transitions of the reinforcement learning agent within the environment ([0139], providing features for deep learning which retrains, i.e. transitions, the learning system).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Baughman, wherein the interaction data comprises counter data indicating respective numbers for at least: one or more states of the environment interacted with by the reinforcement learning agent, one or more actions performed for states of the environment, or one or more transitions of the reinforcement learning agent within the environment, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
The combination of Nakada and Baughman does not explicitly teach generate outlier data, wherein the second elements comprise the outlier data and wherein to process the first elements and the second elements the computation engine is configured to identify contradiction data comprising at least one of contradictory-value observations, contradictory-count observations, or contradictory-goal observations, and wherein the third elements comprise the contradiction data. Zhang generate outlier data, wherein the second elements comprise the outlier data (Page 5, para. 3, collecting abnormal data) and wherein to process the first elements and the second elements the computation engine is configured to identify contradiction data comprising at least one of contradictory-value observations, contradictory-count observations, or contradictory-goal observations, and wherein the third elements comprise the contradiction data (Page 5, para. 3, abnormal data is interpreted to be contradiction data which is values – measurements of an electric energy meter can include a count and also an absolute threshold, i.e. goal see also Page 7, para. 2-5).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada and Baughman with the teachings of Zhang, generate outlier data, wherein the second elements comprise the outlier data and wherein to process the first elements and the second elements the computation engine is configured to identify contradiction data comprising at least one of contradictory-value observations, contradictory-count observations, or contradictory-goal observations, and wherein the third elements comprise the contradiction data, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nakada in view of Cella et al. (US PGPUB No. 2019/0171187 A1) [hereinafter “Cella”].

As per claim 13, Nakada teaches the computing system of claim 1.
Nakada does not explicitly teach wherein the interaction data is for one of: an autonomous vehicle, a conversational assistant, a medical system, a network automation system, a home automation system, or an industrial control system. Cella teaches wherein the interaction data is for one of: an autonomous vehicle ([0231], data collecting for industrial vehicles and support for autonomous control of the system see [0801]), a conversational assistant ([0235], natural language/speech processing), a medical system ([0229], medical diagnostic), a network automation system ([0135], network transport system with automated action), a home automation system ([0229], home automation), or an industrial control system (Abstract, industrial IOT systems).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Nakada with the teachings of Cella, wherein the interaction data is for one of: an autonomous vehicle, a conversational assistant, a medical system, a network automation system, a home automation system, or an industrial control system, to provide the efficiency and productivity inducing features of multi-level data analysis to multiple relevant data fields and environments.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. George et al. (US PGPUB No. 2019/0335006 A1), Coleman et al. (US PGPUB No. 2019/0113973 A1), Gebre et al. (WO-2019052810-A1), Carroll et al. (US PGPUB No. 2019/0026475 A1), Jian et al. (CN-106683672-A), Nicola et al. ("Improvement of PMSM Control Using Reinforcement Learning Deep Deterministic Policy Gradient Agent," 2021 21st International Symposium on Power Electronics (Ee), 2021, pp. 1-6, doi: 10.1109/Ee53374.2021.9628371), Song et al. ("Memristive Neural Network Based Reinforcement Learning with Reward Shaping for Path Finding," 2018 5th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS), 2018, pp. 200-205, doi: 10.1109/ICCSS.2018.8572398) and Zhong et al. ("Deep Actor-Critic Reinforcement Learning for Anomaly Detection," arXiv.1908.10755v1, Aug. 28, 2019, pages 1-6), all disclose various aspects of multi-level reinforcement of training models/polices.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER C SHAW whose telephone number is (571)270-7179. The examiner can normally be reached Max Flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carl Colin can be reached on 571-272-3862. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PETER C SHAW/Primary Examiner, Art Unit 2493                                                                                                                                                                                                        September 26, 2022