Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

This communication is in response to the filing of Application 16/578,913, THEOCHAROUS et al. for “LIFELONG LEARNING WITH A CHANGING ACTION SET”, which was filed on 09/23/2019.
Claims 1-20 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/23/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement has been considered by the examiner. 

Reasons for Allowance
Claims 1-20 are allowed; the following is an examiner’s statement of reasons for allowance:
A detailed search has been performed and no prior art has been found that solely, or in any reasonable combination, read on the claims. The closest prior art found is as follows:

Regarding exemplary independent claim 1, TSITKIN teaches A method for decision-making, comprising: identifying a decision-making process that includes an increasing set of actions; (TSITKIN, Fig. 2, steps 200, 230, paragraphs 27, 53-57, teach MDP problem with a finite set of actions that increase based on vector of random variables at time t.) 
computing a policy function for a Markov decision process (MDP) for the decision-making process, (TSITKIN, Fig. 2, steps 200, 230, paragraphs 27, 53-57, teach computing the MDP problem.)
identifying an additional set of actions for an agent of the MDP; (TSITKIN, Fig. 2, steps 220, 230, paragraphs 27, 56-57, teach identifying value of changed action entry.)
selecting an action based on the updated policy function and the state information; (TSITKIN, Fig. 2, steps 230, 240, paragraphs 27, 57-58, teach selecting and applying a current optimal policy for the MDP problem based on the action entries.)

Koseki et al. (US20180293514A1) teaches updating the inverse dynamics function based at least in part on the additional set of actions; (KOSEKI, Fig. 5, steps 530-560, paragraphs 38, 52-55, teach using inverse reinforcement (i.e. inverse dynamics function) based at least in part on a set of transition probabilities comprising a set of actions.) 
updating the policy function based on the updated inverse dynamics function; receiving state information for the agent; (KOSEKI, Fig. 5, steps 530-560, paragraphs 38, 52-55, teach updating the rule engine for an agent.)
and transmitting an action recommendation to the agent based on the selected action. (KOSEKI, Fig. 5, steps 530-560, in view of paragraphs 2, 25, teach performing the steps as described in Fig. 5 to update a rule for an agent.)

Zadorojniy et al. (US20220101177A1) is directed towards receiving a plurality of domain specific heuristics and a set of states and a set of actions, where an immediate cost and/or reward is associated with a pair of state and action (Abstract). More particularly, Fig. 2, step 201, paragraph 49,  teach receiving domain specific heuristics with their input parameters, an action space, a state space and immediate cost per action and state are also received as an input for the MDP model. However, Zadorojniy does not qualify as prior art because effective filing date of the instant application is prior to the effective filing date of Zadorojniy.

Tatsubori et al. (US20200372323A1) is directed towards detecting a higher-level action from one or more trajectories of real states, and the trajectories are based on an experts' action demonstration, and training predictors to predict future states (Abstract). More particularly, Fig. 2, step 230, paragraph 35, teach performing an action responsive to the designation of the respective pairs, for example, the action can be and/or otherwise involve forming an action plan that includes the pair or a member of the pair.

Non Patent Literature “Planning and Learning with Stochastic Action sets” is directed towards reinforcement learning (RL) set of actions by an exogenous stochastic process (Abstract, page 1). More particularly, as described in the Introduction (section 1), the challenge addressed is that many practical MDP and RL problems have stochastic sets of feasible actions that vary at specific states.

However, none of these references, taken alone or in any reasonable combination, teach the features of: “wherein the policy function is computed based on a state conditional function mapping states into an embedding space, an inverse dynamics function mapping state transitions into the embedding space, and an action selection function mapping the elements of the embedding space to actions; ” as recited in claims 1,  similarly recited in independent claims 14, 18. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.” 

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WALLI Z BUTT whose telephone number is (571)272-5822.  The examiner can normally be reached on 9:00 AM - 5.30 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Jiang, can be reached on 571-270-7191.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
/WALLI Z BUTT/Examiner, Art Unit 2412