DETAILED ACTION
[1]	Remarks
I.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
II.	Claims 2-21 are pending and have been examined, where claims 2-21 is/are found allowable. Explanations will be provided below.
III.	Inventor and/or assignee search were performed and determined no double patenting rejection(s) is/are necessary. The examiner review the claims in issued patent 10,776,670 but the scopes in issued patent and current claims do not have enough overlapping scope to necessitate double patenting rejection(s).
IV.	Patent eligibility (updated in 2019) shown by the following: Claims 2-21 pass patent eligibility test because there are no limitation or a combination of limitations amounting to an abstract idea. Also the following limitation or the combinations of the limitations: “processing the current observation using a model-free reinforcement learning neural network to generate a model-free output, wherein the model-free reinforcement learning neural network does not comprise a model of the environment; generating, from the imagination code and the model-free output, action policy data that defines an action policy for controlling the agent in response to the current observation;” effects a transformation or a reduction of a particular article to a different state or thing / adds a specific limitation(s) other than what is well-understood, routine and conventional in the field, or adding unconventional steps that confine the claim to a particular useful application and providing improvements to the technical field of 
V.	The PCT application, PCT/EP2018/063283, is considered and the examiner determined no reference prior art are relevant to the claims of the current application.

[2]	Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):                                                                                                          
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 

Claim(s) 2-21 are not interpreted under 35 U.S.C. 112(f) or pre-AIA  U.S.C. 112 6th paragraph because of the following reason(s): limitations are modified by sufficient structure or material for performing the claimed function; they are method claims with no association to generic placeholder(s); they are CRM claims. Upon examination of the specification and claims, the examiner has determined, under the best understanding of the scope of the claim(s), rejection(s) under 35 U.S.C. 112(a)/(b) is not necessitated because of the following reasons: sufficient support are provided in the written description / drawings of the invention.

[3]	Reasons for Allowance
Claims 2-21 are allowable / patentable. The following is an examiner’s statement of reasons for allowance by comparing claims to closest found references. The references are divided into primary and secondary, where primary would had been utilized in a USC 102 or main USC 103 reference and secondary would had been utilized a secondary USC 103 reference, but these references do not cover enough of the claim to warrant a rejection.

Primary reference and cited from parent application, 16/689,058, English (US 2017/0161607) discloses a neural network system for model-based reinforcement learning, wherein the neural network system is used to select actions to be performed by an agent interacting with an 
receiving a current observation characterizing a current state of the environment at a current time step (see figure 1B, image pixels are input to receive), 
generating, from the current observation, and using a model of the environment, an imagination code (see paragraph 5, during the training mode, parameters in the neural network may be updated using a stochastic gradient descent, where gradient descent an optimization algorithm which computes the direction fastest to the minimum, where the gradient descent is read as trajectory, see figure 1B, 150-OA, 150-OB, 152-OA, 152-OB, 154-OA, 154-OB, 156-OA, 156-OB and 158-OA are feature being encoded and read as imagination code), but is silent in disclosing wherein the imagination code is data that characterizes each of one or more predicted future trajectories for the agent starting from the current time step; and
processing the current observation using a model-free reinforcement learning neural network to generate a model-free output (see figure 1B, 124 the reinforcement learning, where Gesture Possibilities is read as action, 114 recurrent step is read as reinforced learning), but is silent in disclosing wherein the model-free reinforcement learning neural network does not comprise a model of the environment.
English is silent in disclosing wherein the imagination core is configured to output trajectory data in response to the current observation, the trajectory data defining a trajectory comprising a sequence of future features of the environment imagined by the imagination core.



Primary reference, Jones (US 2018/0086336) discloses 
receiving a current observation characterizing a current state of the environment at a current time step (see figure 1, 102, acquiring register multiple sensor sources);
generating, from the current observation, and using a model of the environment, an imagination code (see figure 1, perceived situation manipulate memory and prediction and inference, see paragraph 27, involves perceiving situational awareness based on the sensed information, where examples of orientation virtual layer activities are Kalman filtering, model based matching, machine or deep learning, and Bayesian predictions), but is silent in disclosing “wherein the imagination code is data that characterizes each of one or more predicted future trajectories for the agent starting from the current time step”
generating, from the trajectory data, a respective rollout embedding for each of the one or more trajectories that represents a summary of the trajectory (see paragraph 27, the decide 
selecting, using the action policy, an action to be performed by the agent in response to the current observation (see figure 1, selects an action from multiple objects to a final decision, also see figure 5 below), 

    PNG
    media_image1.png
    436
    755
    media_image1.png
    Greyscale
.
Jones is silent in disclosing “processing the current observation using a model-free reinforcement learning neural network to generate a model-free output, wherein the model-free reinforcement learning neural network does not comprise a model of the environment.”

Same assignee, Wang (US 20190258918) discloses a method of reinforcement learning, comprises 

an observation characterizing a state of the environment, an action performed by the agent in response to the observation, a reward received in response to the agent performing the action, an action selection score assigned to at least the performed action in determining which action to perform in response to the observation;
training an action selection neural network having action selection neural network parameters, here referred to as policy parameters, on the trajectories in the replay memory, which reads on processing the current observation using a model-free reinforcement learning neural network to generate a model-free output, but not with “the model-free reinforcement learning neural network does not comprise a model of the environment;”
an action selection neural network is configured to: receive an observation characterizing a state of the environment reading on “generating, from the imagination code and the model-free output, action policy data that defines an action policy for controlling the agent in response to the current observation”; and 
process the observation to generate a network output that defines a score distribution over possible actions that can be performed by the agent in response to the observation. 
See paragraph 7. Wang discloses similar teachings to some of the limitations of claims 2, 10 and 18, but Wang does not qualify as a prior art reference, same assignee. 


receiving a current observation characterizing a current state of the environment at a current time step (see figure 4, rt, st, st+1 are read as the observation);
generating, from the current observation, and using a model of the environment, an imagination code (see figure 4, receive operating environment), but is silent in disclosing wherein the imagination code is data that characterizes each of one or more predicted future trajectories for the agent starting from the current time step; and
selecting, using the action policy, an action to be performed by the agent in response to the current observation (see figure 4, µ(s) is read as action policy back to the environment):

    PNG
    media_image2.png
    428
    924
    media_image2.png
    Greyscale
.
Yu is silent in disclosing generating, from the imagination code and the model-free output, action policy data that defines an action policy for controlling the agent in response to the current observation; and processing the current observation using a model-free reinforcement 

Jones, English, Yu and Yang, taken alone or in combination with each other, are silent in disclosing all the limitations of claims 2, 10 and 18. For all the reasons above all claims are allowable. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”



CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEX LIEW (duty station is located in New York City) whose telephone number is (571)272-8623 (FAX 571-273-8623), cell (917)763-1192 or email alexa.liew@uspto.gov. Please note the examiner cannot reply through email unless an internet communication authorization is provided by the applicant. The examiner can be reached anytime. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALEX KOK S LIEW/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        Telephone: 571-272-8623
Date: 12/15/21