DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3, 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Russel et al “Robust Constrained-MDOs: Soft-Constrained Robust Policy Optimization under Model Uncertainty”, 7 pages, in view of Saumya et al “Policy iteration for robust nonstationary Markov decision processes”, pages 1613-1628.
With respect to claims 1 and 9, Russel et al teach (pages 1-2, Introduction section: In control policy safety constraints with respect to model uncertainty are important in real-life applications,…there are hard safety constraints on the robot velocities and steering angles…to formulate this incorporation into robust constrained -MDPs) a controller for controlling a system having uncertainties in its dynamics subject to constraints on an operation of the system, comprising: at least one processor’ and memory having instructions stored thereon that, when executed by the at least one processor, cause the controller to: acquire historical data of the operation of the system including pairs of control actions and state transitions of the system controlled according to corresponding control actions (Introduction, page 1 to page 3, section 2, Problem Formulation); determine, for the system in a current state, a current control action transitioning a state of the system from the current state to a next state, wherein the current control action is determined [according to a robust and constraint Markov decision process (RCMDP) that uses the historical data to optimize a performance cost of the operation of the system subject to an optimization of a safety cost enforcing the constraints on the operation], wherein a state transition for each of state and action pairs in the performance cost and the safety cost is represented by a plurality of state transitions capturing the uncertainties of the dynamics of the system (Introduction, page 1, section 1, to page 3, section 2, Problem Formulation); and control the operation of the system according to the current control action to change the state of the system from the current state to the next state (page 3, section 2, Problem Formulation: The robust Bellman operator TP for a state s Ɛ S and an ambiguity set P computes the best action with respect to the worst-case realization of the transition probabilities in P). Russel et al do not expressly discloses implementation of the robust and constraint Markov decision process, RCMDP based on historical data of the operation of the system and using the historical data to optimization of a safety cost enforcing the constraints on the operation. However, Saumya et al, in page 1614, discloses “In the above MDPs, the state transition probabilities are assumed to be known. Typically, these transition probabilities are estimated statistically from historical data. The resulting estimation errors are ignored in the above MDPs. Robust MDPs address this limitation by instead assuming that the transition probabilities are only known to reside in the so-called “uncertainty sets”. Roughly speaking, the decision-maker then attempts to find a policy that optimizes the worst-case expected cost over all transition probabilities from these uncertainty sets.”. This corresponds to operation of a system and using historically data to optimize a performance cost of the operation of the system subject to an optimization of a safety cost enforcing the constraints on the operation as claimed. Therefore, it would have been obvious to one of skill in the art before the effective filling date of the claimed invention to use the teaching of Saumya et al to modify the process of Russel et al to ensure both safety and robustness in performance cost to find policy that optimize the worst-case expected cost over all transitions probabilities from the uncertainties sets regarding Markov decision process (MDPs) (page 1614). 
 	With respect to claims 3 and 11, combined Russel et al and Saumya et al teach wherein the optimization of the safety cost optimizes an optimization variable subject to a hard constraint (Russel et al, Introduction, page 2, section 1: “These safety constraints are important in real-life applications, where one cannot afford to risk violating some given constraints, e.g., in autonomous cars, there are hard safety constraints on the robot velocities and steering angles”).

 	Claim(s) 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Russel et al “Robust Constrained-MDOs: Soft-Constrained Robust Policy Optimization under Model Uncertainty”, 7 pages, in view of Saumya et al “Policy iteration for robust nonstationary Markov decision processes”, pages 1613-1628, and further in view of Zheng et al “Constrained Upper Confidence Reinforcement Learning”, 20 pages. 
  	With respect to claims 2 and 10, combined Russel and Saumya et al teach the claimed invention as recited in the claims 1 and 9 above, but silent wherein the optimization of the performance cost is a minimax optimization that optimizes the performance cost for worst-case scenario of values of uncertain parameters causing the uncertainties of the dynamics of the system. Zheng et al in the same field of area of Constrained in process discloses this limitation in page 2, section 2. It would have been obvious to one of skill in the art before the effective filling date of the claimed invention to use the teaching of Zheng et al to modify the process of combined Russel et al and Saumya et al to optimizes the performance cost for worst-case scenario of values of uncertain parameters causing the uncertainties of the dynamics of the system (Zheng et al. page 2, section2).   

 	Claim(s) 4-8 and 12-16 are rejected under 35 U.S.C. 103 as being unpatentable over Russel et al “Robust Constrained-MDOs: Soft-Constrained Robust Policy Optimization under Model Uncertainty”, 7 pages, in view of Saumya et al “Policy iteration for robust nonstationary Markov decision processes”, pages 1613-1628 and further in view of Chow et al “Lyapunov-based Safe Policy Optimization for Continuous Control”, 21 pages.
 	With respect to claims 4-6 and 12-14, combined Russel and Saumya et al teach the claimed invention as recited in the claims 1 and 9 above, but silent wherein the performance cost and the safety cost are optimized using a Lyapunov decent; to enforce that the constraints are satisfied at the current state while reducing Lyapunov function along the dynamics of the system over subsequent evolution of the state transitions evolution, wherein the auxiliary cost function is a solution of a robust linear programming optimization problem that maximizes a value of the auxiliary cost function that maintains satisfaction of the safety constraints for all possible states of the system with the uncertainties of the dynamics. However, Chow et al in the same field of area of control process using Lyapunov-based safe policy optimization for continuous control (page 1, section 1, Introduction to page 3, section 2, Preliminaries) to optimize performing cost and safety cost. It would have been obvious to one of skill in the art before the effective filling date of the claimed invention to use the teaching of Chow et al to modify the process of combined Russel et al and Saumya et al to optimizes the performance cost by using Lyapunov-based safe policy optimization for safety cost of the system (Chow et al, page 1, section 1, Introduction to page 3, section 2, Preliminaries).
 	With respect to claims 7-8 and 15-16, combined Russel and Saumya et al teach the claimed invention as recited in the claims 1 and 9 above, but silent wherein the auxiliary cost function is a weighted combination of basis functions with weights determined by the solution of the robust linear programming optimization problem, wherein the auxiliary cost function is a weighted combination of basis functions defining a neural network with weights of the neural network determined by the solution of the robust linear programming optimization problem. However, Chow et al in the same field of area of control process using Policy Gradient Algorithms to provide safe policy optimization in control cost function with weights of neural network, discloses in page 3, section 2.1, Policy Gradient Algorithms. It would have been obvious to one of skill in the art before the effective filling date of the claimed invention to use the teaching of Chow et al to modify the process of combined Russel et al and Saumya et al to optimizes the performance cost by using Policy Gradient Algorithms to provide safe policy optimization in control cost function with weights of neural network, discloses in page 3, section 2.1, Policy Gradient Algorithms. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
 	Jha et al (US 20210178600) DISCLOSES SYSTEM AND METHOD FOR ROBUST OPTIMIZATION FOR TRAJECTORY-CENTRIC MODELBASED REINFORCEMENT LEARNING.
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN BUI whose telephone number is (571)272-2271. The examiner can normally be reached MON-THURS.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ARLEEN M VAZQUEZ can be reached on 571-272-2619. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN BUI/Primary Examiner, Art Unit 2865