DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are presented for examination.
Claims 1-5, 8-15, and 18-20 are rejected.
Claims 6-7, 16-17 are objected to.*

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—the specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10, 15, 18, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
          Claim 10, 15, 18, and 20 recite the limitations “…hierarchical reinforcement learning…” causing indefiniteness, vagueness, and lack of antecedent basis issues. Appropriate correction is required.  
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-5, 8-15, and 18-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Song et al. (US 2021/0192358 A1: hereinafter “Song”).

          Consider claim 1:
                    Song teaches a system for training a machine learning system (Song, e.g., “Methods, systems, and apparatus…predicting the actions of, or influences on, agents in environments with multiple agents, in particular for reinforcement learning…(RFM) system receives agent data representing agent actions…process the agent data as graph data to provide encoded graph data…process the encoded graph data to provide processed graph data…decode the processed graph data to provide…provide representation data for node and/or edge attributes...” of Abstract, ¶ [0005]-¶ [0006], ¶ [0008], and ¶ [0014], Fig. 1 elements 100-120, Fig. 2b steps 200-206, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b), the system comprising: a communication interface that receives environment data (Song, e.g., “…receives a semantic description of the state of the environment 104…” of Fig. 1 elements 100-120, Fig. 4 steps 400-408), the environment data relating to an environment in which the machine learning system may operate (Song, e.g., “…robots or vehicles in a real or simulated environment and the system may predict or explain their behavior, e.g. for safety or control purposes…The neural network system 100 receives a semantic description of the state of the environment 104…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b); a memory containing machine readable medium storing machine executable code (Song, e.g., “…example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices…” of ¶ [0089]); and one or more processors coupled to the memory and configurable to execute the machine executable code (Song, e.g., “…storing computer program instructions and data include all forms of non-volatile memory…example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices…” of ¶ [0089], Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b) to: generate from the environment data a graph abstraction for the environment (Song, e.g., “…the input graph using the GN encoder 302 to provide encoded graph data (402), processes the encoded graph data with the Graph GRU 304 to provide a latent graph (404)…provide the output graph (406)…determine the representation data from the output graph…to predict or explain the actions of one or more of the agents (408)…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b), the graph abstraction comprising a plurality of nodes and edges (Song, e.g., “…The semantic description of the state of the environment may be used to provide attributes for nodes of the input graph…the agent and non-agent entities may be represented by a node and edges may connect each agent to each other agent and to each non-agent entity…” of ¶ [0014], ¶ [0040]-¶ [0045], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b), wherein nodes represent points of interest in the environment (Song, e.g., “…The semantic description of the state of the environment may be used to provide attributes for nodes of the input graph…” of ¶ [0014], ¶ [0040]-¶ [0045], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b) and edges represent traversals between the nodes (Song, e.g., “…wherein the edges connect the agents to each other and to the non-agent entities, and wherein the encoded graph data comprises node attributes and edge attributes representing an updated version of the graph data…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b); and perform hierarchical reinforcement learning using the graph abstraction to train the machine learning system (Song, e.g., “…The reinforcement learning system may be configured to select actions to be performed by one of the agents interacting with the shared environment…an input to obtain state data representing a state of the shared environment…an action selection policy neural network to process the state data and reward data to select the actions…receive and process the representation data to select the actions…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], ¶ [0059], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 2:
                    Song teaches everything claimed as implemented above in the rejection of claim 1. In addition, Song teaches wherein the one or more processors configurable to execute the machine executable code discover one or more pivotal states in the environment (Song, e.g., “…The semantic description of the state of the environment may be used to provide attributes for nodes of the input graph…” of ¶ [0014], ¶ [0040]-¶ [0045], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b).

          Consider claim 3:
                    Song teaches everything claimed as implemented above in the rejection of claim 2. In addition, Song teaches wherein the one or more processors configurable to execute the machine executable code generate edge connections for the graph abstraction using the one or more pivotal states (Song, e.g., “…wherein the edges connect the agents to each other and to the non-agent entities…edge attributes representing an updated version of the graph data…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b).

          Consider claim 4:
                    Song teaches everything claimed as implemented above in the rejection of claim 1. In addition, Song teaches wherein the one or more processors configurable to execute the machine executable code implement a goal-conditioned agent to sample goals in a random walk of the graph abstraction (Song, e.g., “…a process for using the graph processing neural network system 106 to provide representation data predicting an agent action. For each time step t the process inputs data defining a (semantic) description of the state of the environment and builds the input graph using this data (400)…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 5:
                    Song teaches everything claimed as implemented above in the rejection of claim 4. In addition, Song teaches wherein knowledge gained in by the goal-conditioned agent in the random walk of the graph abstraction is transferred to subsequent tasks for the machine learning system (Song, e.g., “…The agent incorporates the graph processing neural network system (Relational Forward Model, RFM) 106…the RFM 106 receives the semantic description of the state of the environment… receives for one or more other agents, e.g. teammates, the last action taken by the agent…infer some or all of this information from observations of the environment…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 8:
                    Song teaches everything claimed as implemented above in the rejection of claim 1. In addition, Song teaches wherein the one or more processors configurable to execute the machine executable code execute a Wide-then-Narrow Instruction (Song, e.g., “…The reinforcement learning system may be configured to select actions to be performed by one of the agents interacting with the shared environment…an input to obtain state data representing a state of the shared environment…an action selection policy neural network to process the state data and reward data to select the actions…receive and process the representation data to select the actions…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], ¶ [0059], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 9:
                    Song teaches everything claimed as implemented above in the rejection of claim 8. In addition, Song teaches wherein options for the Wide-then-Narrow Instruction are limited to pivotal states discovered during the generation of the graph abstraction (Song, e.g., “…The agent incorporates the graph processing neural network system (Relational Forward Model, RFM) 106…the RFM 106 receives the semantic description of the state of the environment… receives for one or more other agents, e.g. teammates, the last action taken by the agent…infer some or all of this information from observations of the environment…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 10:
                    Song teaches everything claimed as implemented above in the rejection of claim 1. In addition, Song teaches wherein the one or more processors configurable to execute the machine executable code implement a Feudal Network to perform hierarchical reinforcement learning (Song, e.g., “…The reinforcement learning system may be configured to select actions to be performed by one of the agents interacting with the shared environment…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], ¶ [0059], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 11:
                    Song teaches a method for training a machine learning system (Song, e.g., “Methods, systems, and apparatus…predicting the actions of, or influences on, agents in environments with multiple agents, in particular for reinforcement learning…(RFM) system receives agent data representing agent actions…process the agent data as graph data to provide encoded graph data…process the encoded graph data to provide processed graph data…decode the processed graph data to provide…provide representation data for node and/or edge attributes...” of Abstract, ¶ [0005]-¶ [0006], ¶ [0008], and ¶ [0014], Fig. 1 elements 100-120, Fig. 2b steps 200-206, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b) comprising: receiving, at one or more processors, environment data (Song, e.g., “…receives a semantic description of the state of the environment 104…” of Fig. 1 elements 100-120, Fig. 4 steps 400-408), the environment data relating to an environment in which the machine learning system may operate (Song, e.g., “…robots or vehicles in a real or simulated environment and the system may predict or explain their behavior, e.g. for safety or control purposes…The neural network system 100 receives a semantic description of the state of the environment 104…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b); generating from the environment data, at the one or more processors, a graph abstraction for the environment (Song, e.g., “…the input graph using the GN encoder 302 to provide encoded graph data (402), processes the encoded graph data with the Graph GRU 304 to provide a latent graph (404)…provide the output graph (406)…determine the representation data from the output graph…to predict or explain the actions of one or more of the agents (408)…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b), the graph abstraction comprising a plurality of nodes and edges (Song, e.g., “…The semantic description of the state of the environment may be used to provide attributes for nodes of the input graph…the agent and non-agent entities may be represented by a node and edges may connect each agent to each other agent and to each non-agent entity…” of ¶ [0014], ¶ [0040]-¶ [0045], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b), wherein nodes represent points of interest in the environment (Song, e.g., “…The semantic description of the state of the environment may be used to provide attributes for nodes of the input graph…” of ¶ [0014], ¶ [0040]-¶ [0045], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b) and edges represent traversals between the nodes (Song, e.g., “…wherein the edges connect the agents to each other and to the non-agent entities, and wherein the encoded graph data comprises node attributes and edge attributes representing an updated version of the graph data…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b); and performing hierarchical reinforcement learning, at the one or more processors, using the graph abstraction to train the machine learning system (Song, e.g., “…The reinforcement learning system may be configured to select actions to be performed by one of the agents interacting with the shared environment…an input to obtain state data representing a state of the shared environment…an action selection policy neural network to process the state data and reward data to select the actions…receive and process the representation data to select the actions…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], ¶ [0059], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 12:
                    Song teaches everything claimed as implemented above in the rejection of claim 11. In addition, Song teaches wherein generating the graph abstraction for the environment comprises discovering one or more pivotal states in the environment (Song, e.g., “…The semantic description of the state of the environment may be used to provide attributes for nodes of the input graph…” of ¶ [0014], ¶ [0040]-¶ [0045], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b).

          Consider claim 13:
                    Song teaches everything claimed as implemented above in the rejection of claim 12. In addition, Song teaches wherein generating the graph abstraction for the environment comprises generating edge connections for the graph abstraction using the one or more pivotal states (Song, e.g., “…wherein the edges connect the agents to each other and to the non-agent entities…edge attributes representing an updated version of the graph data…” of ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, and Figs. 5a-b elements 500a-b).

          Consider claim 14:
                    Song teaches everything claimed as implemented above in the rejection of claim 11. In addition, Song teaches wherein generating the graph abstraction for the environment comprises employing a goal-conditioned agent to sample goals in a random walk of the graph abstraction (Song, e.g., “…a process for using the graph processing neural network system 106 to provide representation data predicting an agent action. For each time step t the process inputs data defining a (semantic) description of the state of the environment and builds the input graph using this data (400)…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 15:
                    Song teaches everything claimed as implemented above in the rejection of claim 14. In addition, Song teaches wherein performing hierarchical reinforcement learning comprises transferring knowledge gained by the goal-conditioned agent in the random walk of the graph abstraction to subsequent tasks for the machine learning system (Song, e.g., “…The agent incorporates the graph processing neural network system (Relational Forward Model, RFM) 106…the RFM 106 receives the semantic description of the state of the environment… receives for one or more other agents, e.g. teammates, the last action taken by the agent…infer some or all of this information from observations of the environment…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 18:
                    Song teaches everything claimed as implemented above in the rejection of claim 11. In addition, Song teaches wherein performing hierarchical reinforcement learning comprises executing a Wide-then-Narrow Instruction (Song, e.g., “…The reinforcement learning system may be configured to select actions to be performed by one of the agents interacting with the shared environment…an input to obtain state data representing a state of the shared environment…an action selection policy neural network to process the state data and reward data to select the actions…receive and process the representation data to select the actions…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], ¶ [0059], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 19:
                    Song teaches everything claimed as implemented above in the rejection of claim 18. In addition, Song teaches wherein options for the Wide-then-Narrow Instruction are limited to pivotal states discovered during the generation of the graph abstraction (Song, e.g., “…The agent incorporates the graph processing neural network system (Relational Forward Model, RFM) 106…the RFM 106 receives the semantic description of the state of the environment… receives for one or more other agents, e.g. teammates, the last action taken by the agent…infer some or all of this information from observations of the environment…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

          Consider claim 20:
                    Song teaches everything claimed as implemented above in the rejection of claim 11. In addition, Song teaches wherein a Feudal Network is used to perform hierarchical reinforcement learning (Song, e.g., “…The reinforcement learning system may be configured to select actions to be performed by one of the agents interacting with the shared environment…” of ¶ [0016]-¶ [0019], ¶ [0040]-¶ [0044], ¶ [0056], ¶ [0059], and Fig. 1 elements 100-120, Fig. 4 steps 400-408, Figs. 5a-b elements 500a-b, and Fig. 6 elements 102-120, 600-606).

Allowable Subject Matter
Claims 6-7, 16-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims granted that all the rejections, objections are rendered moot. Further, the claimed subject matter of Claims 6-7, 16-17 are not suggested or taught by the prior art on record either in singularity or combination.       
             
Conclusion
20.     The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

          Jafari Tafti et al. (US Pub. No.: 2019/0204842 A1) teaches “A vehicle, system and method of autonomous navigation of the vehicle. A reference trajectory for navigating a training traffic scenario along a road section is received at a processor of the vehicle. The processor determines a coefficient for a cost function associated with a candidate trajectory that simulates the reference trajectory. The determined coefficient is provided to a neural network to train the neural network. The trained neural network generates a navigation trajectory for navigating the vehicle using a cost coefficient determined by the neural network. The vehicle is navigated along the road section using the navigation trajectory.”

         Ogale et al. (US Pub. No.: 2020/0174490 A1) teaches “Systems, methods, devices, and other techniques for planning a trajectory of a vehicle. A computing system can implement a trajectory planning neural network configured to, at each time step of multiple time steps: obtain a first neural network input and a second neural network input. The first neural network input can characterize a set of waypoints indicated by the waypoint data, and the second neural network input can characterize (a) environmental data that represents a current state of an environment of the vehicle and (b) navigation data that represents a planned navigation route for the vehicle. The trajectory planning neural network may process the first neural network input and the second neural network input to generate a set of output scores, where each output score in the set of output scores corresponds to a different location of a set of possible locations in a vicinity of the vehicle.”

21.     Any inquiry concerning this communication or earlier communications from the examiner should be directed to BABAR SARWAR whose telephone number is (571)270-5584.  The examiner can normally be reached on Mon-Fri 9:00 AM-5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faris S. Almatrahi can be reached on (313)446-4821.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BABAR SARWAR/Primary Examiner, Art Unit 3667