DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 6, 10, 15, 16, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims are directed toward the abstract idea of obtaining a first action (or action) of an agent, based on a current state of the agent using a cross-entropy guided policy (CGP) neural network (or a q-gradient guided policy (QGP) neural network); control to perform the obtained action, and obtaining a second action of an agent based on an input state of the agent (or a Q-value corresponding to an expected return of taking the action) ,wherein the CGP (or QGP) neural network is trained using a cross-entropy method (CEM) policy neural network (or a Q-function neural network) trained separately from the training of the CGP (or QGP) neural network. These limitations can be accomplished by the human mind via observation and predict the action of an agent, such as a robot, and as result, allows control of the action of the agent/robot. With regard to whether the abstract idea is integrated into a practical application, it is clear that Applicant's claims do not comprise any additional elements that, individually or in combination, have integrated the judicial exception into a practical application. Since the abstract idea in Applicant’s claims 1, 6, 10, 15, 16, and 20 are implemented on a computer and there are no further limitations or structural elements that go beyond the computer/processor, it can clearly be seen that the abstract idea of obtaining action/s of an agent and separately training the networks is merely implemented on a computer. Thus, there is no integration of the abstract idea into a practical application. Please note, according to the USPTO released new examination guidelines dated January 7, 2019, for determining whether a claim is directed to non-statutory subject matter, the guidelines provide the following exemplary considerations that are indicative that an additional element (or combination of elements) may have integrated the judicial exception into a practical application:
an additional element reflects an improvement in the functioning of a
computer, or an improvement to other technology or technical field;
an additional element that applies or uses a judicial exception to effect a
particular treatment or prophylaxis for a disease or medical condition;
an additional element implements a judicial exception with, or uses a
judicial exception in conjunction with, a particular machine or manufacture
that is integral to the claim;
e an additional element effects a transformation or reduction of a particular
article to a different state or thing; and
an additional element applies or uses the judicial exception in some other
meaningful way beyond generally linking the use of the judicial exception
to a particular technological environment, such that the claim as a whole is
more than a drafting effort designed to monopolize the exception.
It is clear that Applicant's claims do not comprise any of the above additional elements that, individually or in combination, have integrated the judicial exception into a practical application.
With regard to whether the claims recite additional elements that provide significantly more than the recited judicial exception, applicant's claims do not recite additional elements that provide significantly more than the recited judicial exception. At least one or more of the claims require a memory (or a computer-readable medium) storing instructions and a processor/computer to execute the instructions. These generic computer components are claimed to perform their basic functions of at least to store instructions for obtaining actions of the agent and to separately train the neural networks. The recitation of the memory and computer/processor limitations amount to a mere instruction to implement the abstract idea on the computer. Accordingly, claims 1, 6, 10, 15, 16, and 20 are not patent eligible.
Claims 2-5, 7-9, 11-14, and 17-19 do not comprise any further limitations which
cause the abstract idea to be integrated into a practical application or recite significantly
more than the abstract idea. Therefore, claims 2-5, 7-9, 11-14, and 17-19 are also rejected
under 35 USC 101, and thus are not patent eligible.

Closest Prior Art
5. 	The closest prior art to: Xu, et al. (US 10,860,926 B2) disclose a reinforcement learning system comprising one or more computers configured to: retrieve training data comprising a plurality of experiences generated as a result of an agent interacting with an environment to perform a task in an attempt to achieve a specified result, each experience comprising an observation characterizing a state of the environment, an action performed by the agent in response to the observation and a reward received in response to the action; and train a reinforcement learning neural network having a plurality of policy parameters to control the agent to perform the task by jointly training (emphasis added by the examiner) (i) the reinforcement learning neural network and (ii) a return function that has one or more return parameters and that calculates returns from rewards received by the agent in response to the actions performed by the agent, comprising: updating the one or more policy parameters for the reinforcement learning neural network based on a first set of the experiences using the return function, comprising: calculating a respective return value for each experience in the first set based on the one or more return parameters of the return function and the rewards in the experiences in the first set, and updating the one or more policy parameters using the respective return values for the experiences through reinforcement learning; updating the one or more return parameters of the return function used to calculate the respective return values based on the one or more updated policy parameters and a second set of the experiences, wherein the one or more return parameters are updated via a gradient ascent or descent method using a meta-objective function differentiated with respect to the one or more return parameters, wherein the meta-objective function is dependent on the one or more policy parameters; retrieving updated experiences generated as a result of the agent interacting with the environment to perform the task under the control of the reinforcement neural network using the one or more updated policy parameters and the one or more updated return parameters; further updating the one or more policy parameters based on a first set of the updated experiences using the one or more updated return parameters; and further updating the one or more return parameters based on the further updated policy parameters and a second set of the updated experiences via the gradient ascent or descent method (See for example, Figure 2); and Gu, et al. (Continuous Deep Q-Learning with Model-based Acceleration) provides three main contributions: first, derive and evaluate a Q-function representation that allows for effective Q-learning in continuous domains. Second, evaluate several naïve options for incorporating learned models into model-free Q-learning, and show that they are minimally effective on our continuous control tasks. Third, propose to combine locally linear models with local on-policy imagination rollouts to accelerate model-free continuous Q-learning, and show that this produces a large improvement in sample complexity (See section 1, paragraph 4). 
Conclusion
6. 	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US Patent Numbers: 10,733,502 (See for example, Fig. 1), 10,860,926 (See for example, Fig. 2), and 11,132,211 (See for example, Figs. 6-7); and a publication to: Gu, et al. (Continuous Deep Q-Learning with Model-based Acceleration)  (See section 1, paragraph 4), and Khan, et al. (Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom) disclose, among other things, Markov Decision Process is used to model the problem and Q-learning to learn the policy. An ε-greedy policy with linear ε- decay is used for selecting an action. The Q-function is approximated with the convolutional neural network by training it with „Stochastic Gradient Decent‟ using experience replay (See page 37, rt column).
7. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL G MARIAM whose telephone number is (571)272-7394. The examiner can normally be reached M-F 7:30-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, EDWARD F URBAN can be reached on 571-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL G MARIAM/Primary Examiner, Art Unit 2665