DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The Information Disclosure Statement (IDS) submitted on 09/17/2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the IDS statement has been considered by the Examiner.
Claim Objections
Claim 13 is objected to because of typographical error. It appears that the dead-end state has been erroneously typed a dead-end state. Proper correction is required.
Claim 18 is objected to because the claim is missing period “.” At the end the claim. Proper correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1, 6 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yadong Zhu US 20180165745 (hereinafter Zhu) in view of Ishan Jindal US 20190339087 (hereinafter Jindal) and further in view of Kyle Hollins Wray US 20200005645 (hereinafter Wray).
As per claim 1, Zhu teaches: A computerized system comprising:
one or more processors (Zhu: fig. 2); and
computer storage memory having computer-executable instructions stored
thereon which, when executed by the one or more processors, perform operations
comprising:
determining a current state of an agent within an environment of a decision process that models a performance of a task (“The Agent determines the current state s, and according to a certain strategy, outputs the corresponding action a. Correspondingly” Zhu: para. 61);
determining a plurality of actions based on the decision process, wherein each of the plurality of actions is available for execution at the agent’s current state (“The Agent determines the current state s, and according to a certain strategy, outputs the corresponding action a. Correspondingly, the recommendation server 210 may provide the recommended behavior according to a certain recommendation strategy [plurality of actions based on the MDP] and the current link status of the user” Zhu: para. 61 and “Generally, the program module includes routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types.” Zhu: 162 and 168);
employing a secured policy to select a first action of the plurality of actions (policy is employed that selects an action Jindal: para. 92)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhu  with the methods of Jindal to meet the preceding limitations. One of ordinary skill in the art would have been motivated to make such modification since such techniques were known at the time of the instant invention and would have been applied in a predictable manner to enhance the optimization of processing tasks.
The combination of Zhu and Jindal does not teach; however, Wray discloses: the secured policy provides a score for each of the plurality of actions that is based on a probability that performing the action at the agent’s current state will transition the agent’s current state to a dead-end state of the agent (“Wray: para. 109). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Zhu and Jindal with the teaching of Wray to meet the preceding limitations. One of ordinary skill in the art would have been motivated to make such modification since such techniques were known at the time of the instant invention and would have been applied in order to enhance the optimization of processing tasks.
As per claim 6, the rejection of claim 1 is incorporated herein. Zhu teaches: the secured policy is determined through a reinforcement learning (RL) method that employs a security cap to reduce an amount of resources employed to explore a plurality of dead-end states of the agent (Zhu: Abs.).
As per claim 19, Zhu teaches: A non-transitory computer-readable media having instructions stored thereon, wherein the instructions, when executed by a processor of a computing device, cause the computing device to perform operations comprising:
determining a current state of an agent within an environment of a Markov Decision Process (MDP) that models a performance of a task (“The Agent determines the current state s, and according to a certain strategy, outputs the corresponding action a. Correspondingly” Zhu: para. 61);
 determining a plurality of actions based on the MDP, wherein each of the plurality of actions are available for execution at the agent’s current state (“The Agent determines the current state s, and according to a certain strategy, outputs the corresponding action a. Correspondingly, the recommendation server 210 may provide the recommended behavior according to a certain recommendation strategy [plurality of actions based on the MDP] and the current link status of the user” Zhu: para. 61 and “Generally, the program module includes routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types.” Zhu: 162 and 168);
Zhu does not teach; however, Jindal discloses: employing a secured policy to select a first action of the plurality of actions (policy is employed that selects an action Jindal: para. 92)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhu  with the methods of Jindal to meet the preceding limitations. One of ordinary skill in the art would have been motivated to make such modification since such techniques were known at 
The combination of Zhu and Jindal does not teach; however, Wray discloses:  the secured policy provides a score for each of the plurality of actions that is based on a probability that performing the action at the agent’s current state will transition the agent’s current state to a dead-end state of the agent (“Wray: para. 109). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Zhu and Jindal with the teaching of Wray to meet the preceding limitations. One of ordinary skill in the art would have been motivated to make such modification since such techniques were known at the time of the instant invention and would have been applied in order to enhance the optimization of processing tasks.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Jindal in view of Wray and further in view of Uday Masurekar et al. US 20170317974 (hereinafter Masurekar).
As per claim 2, the rejection of claim 1 is incorporated herein. The combination of Zhu, Jindal and Wray does not teach; however, Masurekar discloses: the performance of the task includes achieving an objective within a virtualized environment and executing the first action transitions the agent’s current state to a winning state the includes the agent achieving the objective within the virtualized environment (Masurekar: para. 107 and 121).
.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Jindal in view of Wray and further in view of Scott Pappada US 20150227710 (hereinafter Pappada).
As per claim 3, the rejection of claim 1 is incorporated herein.  The combination of Zhu, Jindal and Wray does not teach; however, Pappada discloses: the performance of the task includes providing a therapeutic treatment in a physical environment and executing the first action includes providing a user one or more pharmaceuticals (Pappada: para. 19 and 48).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Zhu, Jindal and Wray with the teaching of Pappada to meet the preceding limitations. One of ordinary skill in the art would have been motivated to make such modification since such techniques were known at the time of the instant invention and would have been applied in order to enhance the utility of the method.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Jindal in view of Wray and further in view of Jianfeng Gao et al. US 20190324795 (hereinafter Gao).
As per claim 8, the rejection of claim 1 is incorporated herein. The combination of Zhu, Jindal and Wray does not teach; however, Gao discloses: each undesired terminal state of a plurality of undesired terminal states of the agent are associated with an exploration reward value of -1.0 and each dead-end state of a plurality of dead-end states of the agent are associated with an exploration reward value that is between -1.0 and 0.0 (Gao: para. 29).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Jindal in view of Wray and further in view of Jordi Grau-Moya US 20200364555 (hereinafter Grau-Moya).
As per claim 9, the rejection of claim 1 is incorporated herein. The combination of Zhu, Jindal and Wray does not teach; however, Gao discloses: the secured policy is iteratively updated based on an off-policy mechanism and the agent is an exploitation agent that is iteratively updated based on the iteratively updated secure policy (Grau-Moya: para. 61).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Zhu, Jindal and Wray with the teaching of Grau-Moya to meet the preceding limitations. One of ordinary skill in the art would have been motivated to make such modification since such 
As per claim 11, the rejection of claim 1 is incorporated herein. Zhu teaches: the decision process is a Markov Decision process (MDP) (Zhu: para. 21).

Allowable Subject Matter
The subject matter of claims 4, 5, 7, 10, 12 and 20 are not suggested by the prior art of record.  Claims 4, 5, 7, 10, 12 and 20 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if overcome the objections set forth in this Office Action. 
The invention defined in claim 12 is not suggested by the prior art of record. 
The prior art of record (in particular, Zhu; Yadong US-20180165745-A1, JINDAL; Ishan US-20190339087-A1, Wray; Kyle Hollins US-20200005645-A1, Masurekar; Uday US-20170317974-A1, Pappada; Scott US-20150227710-A1, Yoshiike; Yukiko US-20100318478-A1, GAO; Jianfeng US-20190324795-A1, Ring; Mark US-20160012338-A1, SRIVASTAVA; Saurabh US-20190361579-A1 and YAMAMOTO; Kazeto US-20210065027-A1) singly or in combination does not disclose, with respect to independent claim 12 “iteratively updating the secured policy based on the iteratively updated security cap, wherein the updated secured policy is less than the security cap and is reduced by an amount that is based on the one or more probabilities that the agent’s particular state is the dead-end state of the agent.” in combination with the other claimed features as a whole."

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GHODRAT JAMSHIDI whose telephone number is (571)270-1956. The examiner can normally be reached 10:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carl Colin can be reached on 5712723862. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GHODRAT JAMSHIDI/           Primary Examiner, Art Unit 2493