Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Application
2.	This communication is responsive to the applicant’s amendment filed on 2/22/22. Claims 1, 6-10, 15-16, and 20 have been amended, claims 2, 5, 11, 14, and 17 have been canceled.
Response to Arguments
3.	35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph rejection for claim 5 and 14 have been withdrawn as the claim 5 and 14 have been canceled.
4.	Applicant’s first argument, see Page 7, filed on 2/22/22, with respect to claim 1, 10, and 16 have been considered and is not persuasive. Applicant argued that Xu in its entirety, does not teach or suggest that the PID gains are weights of the actor network, and further does not teach or suggest that the weights of the actor network initialize the PID gains on each parameter of the PID controller.
	Examiner does not agree with the above argument. Xu disclose that the actor network is used to tune the PID gains. As per Fig. 2 of Xu, the PID gain is tuned by actor network output delta K (Page. 1654). The actor network output is broadly interpreted as weights of the actor network which is utilize to initialize the PID gains on each parameter of the PID controller. Therefore Xu teaches the claimed PID gain and actor network weights. The claim does not define the “weights” term. Hence under BRI, the prior art output delta K is interpreted to claimed weights of the actor network which used to tune the PID gain.

	Applicant’s above argument is not persuasive. The claimed limitation is broad and disclose that citric network including at least one function associated with the actor.  Under the broadest reasonable interpretation Xu teaches that based on the value function estimation in the critic network, the actor network estimates the policy gradient using the following formula (Page. 1655). The actor network estimates the gradient based on the function estimation in the citric network. Hence there is a function relation between the actor and critic network.
Remark: 	Examiner encourage applicant to claim the function of the critic network associated with the actor in order to overcome the present art rejection and move the claims towards allowance.
6.	Applicant also argued that Kumar in its entirety, however, does not teach or suggest that the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller.
	Examiner does not agree with the applicant’s above argument as the argument is not persuasive. Kumar clearly disclose a comparative study of PID tuning methods using Anti-windup controller (Refer to the title of the document). Kumar also disclose that the standard tracking anti-windup structure commonly described in the literature is shown in Fig. 5, where T, is denoted as the Tracking Time Constant. Once the controller output exceeds the actuator limits, a feedback signal is generated from the difference of the saturated and the unsaturated control signals and used to reduce the integrator input (Refer to Page. 3, col.2, Ln. 6-13). Hence Kumar 
Remark: 	On the remarks filed on 2/22/22, Page. 10, applicant explained the anti-windup method based on specification Para. [0042]-[0043] and equation 4. Examiner encourage the applicant to claim the detail and equation related to anti-windup process into the claim to overcome the art rejection and move the claim towards the allowance.

Claim Rejections - 35 USC § 103
 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Badgewell (US PG Pub: 2019/0187631) in view of Kumar (NPL: IEEE (Santosh Kumar: A Comparative Study of PID Tuning Methods Using Anti-Windup Controller, 2012 2nd International Conference on Power, Control and Embedded Systems)), and further in view of Xu (NEURAL-NETWORK-BASED LEARNING CONTROL FOR THE HIGHSPEED PATH TRACKING OF UNMANNED GROUND VEHICLES (Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002).
8.	Regarding claim 1, Badgewelll teaches a system for reinforcement learning, comprising: an actor-critic framework comprising an actor and a critic (e.g., In various aspects, creating a reinforcement learning agent for a PID controller tuning problem can include defining of reinforcement learning agents include, but are not limited to, actor-critic agents and state-action-reward-state-action agents (or SARSA agents)) (Para. [0036]), 
	the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy. Tuning changes are implemented as incremental changes to the existing tuning parameters so that the tuning policy can generalize more easily to a wide range of PID loops. The implementation of incremental tuning parameters is important to avoid the implementation of over aggressive changes. For example, a sluggish PID control loop with a controller gain of 5.0. After a few experiments a control engineer might learn that increasing the gain to 10.0 provides acceptable closed-loop behavior. The engineer might conclude, incorrectly, that a controller gain of 10.0 is the best value for all PID loops. The correct conclusion, however, is that doubling the controller gain will make any PID loop more aggressive. The implementation of incremental tuning parameter changes will ensure that the learning agent will learn the right lessons) (Para. [0026]),
and the critic comprising a critic network (e.g., In various aspects, creating a reinforcement learning agent for a PID controller tuning problem can include defining actor-critic agents and state-action-reward-state-action agents (or SARSA agents)) (actor and critic agent as a learning agent considered to have their network for tuning process) (Para. [0036]);
	and a controller including the PID controller  (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy) (Para. [0026]) comprising a neural network embedded in the actor-critic framework (e.g., In this example we use a neural network architecture known as Deep Reinforcement Learning. The neural network provides a convenient method for the function approximation, and the deep architecture extracts relevant features from the raw data automatically, eliminating the need for feature engineering. The use of Deep Neural Networks with Reinforcement Learning is known as Deep Reinforcement Learning or DRL. FIG. 6 illustrates the continuous states and actions for a DRL agent that has been tested on a number of different simulated complex processes) (Para. [0060]).
Badgewell does not specifically teach and which is tuned according to reinforcement learning based tuning including anti- windup tuning; wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller. 
and to add an anti-windup compensator to prevent the degradation of performance [2]. Basically in compensator tuning, two different approaches: Conditional integration and back-calculation are used [4]. It is proposed to combine the different approaches in order to overcome these problems) (Section I: Introduction, Col. 2, Ln. 14-24);
wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller (e.g., the standard tracking anti-windup structure commonly described in the literature is shown in Fig. 5, where T, is denoted as the Tracking Time Constant. Once the controller output exceeds the actuator limits, a feedback signal is generated from the difference of the saturated and the unsaturated control signals and used to reduce the integrator input (anti-windup structure used to generate feedback signal including the parameter to tune the PID controller) (Page. 3, col.2, Ln. 6-13, also Refer to Page. 2, Fig. 2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having teachings of Badgewell and Kumar before him/her, to modify the teachings of Badgewell to include the teachings of Kumar with the motivation to prevent the degradation of performance (Kumar: Page. 1, Col. 2, Ln.17-19).
The combination of Badgewell and Kumar does not specifically teach wherein the PID gains are weights of the actor network, the weights of the actor network initializes the PID gains on each parameter of the PID controller; [and the critic comprising a critic network] including at least one function associated with the actor.
PID gains. The learning control module has a critic neural network and two actor networks and the adaptive heuristic critic (AHC) algorithm is used to update the weights of the networks) (Page. 1653), the weights of the actor network initializes the PID gains on each parameter of the PID controller (e.g., In Fig.2, Yd denotes the desired outputs, Y is the output vector of the vehicle and Yr is the output vector of the reference model. The RL module in Fig2 has an actor-critic learning control architecture, which was early studied in [8} and has been successfully applied in several difficult the RL module, which are the actor network, the critic network and the reward function, respectively. The actor network is used to tune the PID gains online by its output AK ) (Page. 1654, Fig. 2);
 [and the critic comprising a critic network] including at least one function associated with the actor (e.g., Based on the value function estimation in the critic network, the actor network estimates the policy gradient using the following formula) (Page. 1655).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having teachings of Badgewell, Kumar and Xu before him/her, to modify the teachings of Badgewell, and Kumar to include the PID gains and actor weight teachings of Xu with the motivation to utilize the robustness of PID control and the optimization ability of learning control (Xu: Page. 1653).
The combination of Badgewell, Kumar, and Xu does not specifically teach 
XX teaches the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller.

Regarding claim 3, the combination of Badgewell, Kumar, and Xu teaches the system of claim 1 wherein Badgewell further teaches the controller allows for constraining of individual parameters (e.g., In some aspects, the controller tuning parameters can include one or more of a proportional tuning parameter; a gain parameter; an integral time parameter; an integral tuning parameter; a derivative time parameter; and a derivative tuning parameter. For example, the controller tuning parameters can include at least one of a proportional tuning parameter and a gain parameter, at least one of an integral tuning parameter and an integral time parameter, and optionally at least one of a derivative tuning parameter and a derivative time parameter) (Para. [0009]).  
10.	Regarding claim 4, the combination of Badgewell, Kumar, and Xu teaches the system of claim 1 wherein Badgewell further teaches wherein the actor network is initialized with gains, which are already in use or known to be stabilizing (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy. Tuning changes are implemented as incremental changes to the existing tuning parameters so that the tuning policy can generalize more easily to a wide range of PID loops. The implementation of incremental tuning parameters is important to avoid the implementation of over aggressive changes. For example, a sluggish PID control loop with a controller gain of 5.0. After a few experiments a control engineer might learn that increasing the gain to 10.0 provides acceptable closed-loop behavior. The engineer might conclude, incorrectly, 
11.	Regarding claim 6, the combination of Badgewell, Kumar, and Xu teaches teaches the system of claim 1 wherein Badgewell further teaches wherein weights associated with the actor are initialized with selected PID gains (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy. Tuning changes are implemented as incremental changes to the existing tuning parameters so that the tuning policy can generalize more easily to a wide range of PID loops. The implementation of incremental tuning parameters is important to avoid the implementation of over aggressive changes. For example, a sluggish PID control loop with a controller gain of 5.0. After a few experiments a control engineer might learn that increasing the gain to 10.0 provides acceptable closed-loop behavior. The engineer might conclude, incorrectly, that a controller gain of 10.0 is the best value for all PID loops. The correct conclusion, however, is that doubling the controller gain will make any PID loop more aggressive. The implementation of incremental tuning parameter changes will ensure that the learning agent will learn the right lessons) (Para, [0026]).
Regarding claim 7, the combination of Badgewell, Kumar, and Xu teaches the system of claim 5 wherein Kumar further teaches the PID controller comprises a (Proportional- Derivative) portion (e.g., The proportional derivative part (if the manipulated variable adopted to generate the anti-windup feedback signal) ( Page. 2, Col. 2, Ln. 3-5).   
13.	Regarding claim 8, the combination of Badgewell, Kumar, and Xu teaches teaches the system of claim 5 wherein Badgewell further teaches the PID controller comprises an integral portion (e.g., In this discussion, it is understood that references to a PID controller also include proportional-integral (PI) controllers that operate using only a proportional term and an integral term) (Para. [0029]).  
14.	Regarding claim 9, the combination of Badgewell, Kumar, and Xu teaches the system of claim 5 wherein Kumar further teaches the PID controller comprises a PD (Proportional- Derivative) portion (e.g., The proportional derivative part (if the manipulated variable adopted to generate the anti-windup feedback signal) (Page. 2, Col. 2, Ln. 3-5) and an integral portion (Refer to Fig.5 control variable and integral term with conditional integration and back-calculation).  
15.	Regarding claim 10, Claim 10 recites a system, with substantially the same limitations as system claim 1. Therefore the rejection applied to claim 1 also applies to claims 10. Badgewell further disclsoe comprising: at least one processor; and a non-transitory computer-usable medium embodying computer program code, said computer-usable medium capable of communicating with said at least one processor, said computer program code comprising instructions executable by said at least one processor and configured for (e.g., The first process controller can further include a processor having an associated memory containing executable 
16.	Regarding claim 12-13, as to claim 12-13, applicant is directed to the citation for claim 3-4 above. 
17.	Regarding claim 15, as to claim 15, applicant is directed to the citation for claim 6 above. 
18.     Regarding claim 16, Claim 16 recites a method that implement the system of claim 1, with substantially the same limitations, respectively. Therefore the rejection applied to claim 1 also applies to claim 16.
19.	Regarding claim 18-19, as to claim 18-19, applicant is directed to the citation for claim 3-4 above. 
20.	Regarding claim 20, as to claim 20, applicant is directed to the citation for claim 6 above. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIGNESHKUMAR C PATEL whose telephone number is (571)270-0698.  The examiner can normally be reached on Monday - Friday, 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kenneth M Lo can be reached on (571)272-9774.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JIGNESHKUMAR C PATEL/Primary Examiner, Art Unit 2116