Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR1.114. Applicant's submission for RCE filed on 11/5/21 has been entered.
 	This communication is responsive to the applicant’s amendment filed on 11/5/21. Claims 1, 10, and 16 have been amended. Claims 1-20 are presented for examination.
Response to Arguments
3.	Applicant’s arguments, see Pages 6-8, filed on 11/5/21, with respect to the rejection(s) of claim(s) 1-20 under Badgewell in view of Kumar have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, due to amendment to the claim a new ground(s) of rejection is made further in view of Xu (NEURAL-NETWORK-BASED LEARNING CONTROL FOR THE HIGHSPEED PATH TRACKING OF UNMANNED GROUND VEHICLES (Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002).
4.	Applicant’s arguments, see Pages 6, filed on 11/5/21, with respect to the requirement for prima facie obviousness have been fully considered. 
	The claimed invention of claim 1 is rejected under Badgewell in view of Kumar and further in view of Xu. As per the rejection, Badgewell teaches the reinforcement learning and 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 5 and 14 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Claim 5 is depend on claim 1 which does now include the limitation of claim a controller including the PID controller. Claim 14 is depend on claim 10 which does now include the limitation of claim 5, a controller including the PID controller.
  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 103
 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Badgewell (US PG Pub: 2019/0187631) in view of Kumar (NPL: IEEE (Santosh Kumar: A Comparative Study of PID Tuning Methods Using Anti-Windup Controller, 2012 2nd International Conference on Power, Control and Embedded Systems)), and further in view of Xu (NEURAL-NETWORK-BASED LEARNING CONTROL FOR THE HIGHSPEED PATH TRACKING OF UNMANNED GROUND VEHICLES (Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002).
6.	Regarding claim 1, Badgewelll teaches a system for reinforcement learning, comprising: an actor-critic framework comprising an actor and a critic (e.g., In various aspects, creating a reinforcement learning agent for a PID controller tuning problem can include defining appropriate states, rewards, and actions; specifying a structure for the neural networks used for of reinforcement learning agents include, but are not limited to, actor-critic agents and state-action-reward-state-action agents (or SARSA agents)) (Para. [0036]), 
	the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy. Tuning changes are implemented as incremental changes to the existing tuning parameters so that the tuning policy can generalize more easily to a wide range of PID loops. The implementation of incremental tuning parameters is important to avoid the implementation of over aggressive changes. For example, a sluggish PID control loop with a controller gain of 5.0. After a few experiments a control engineer might learn that increasing the gain to 10.0 provides acceptable closed-loop behavior. The engineer might conclude, incorrectly, that a controller gain of 10.0 is the best value for all PID loops. The correct conclusion, however, is that doubling the controller gain will make any PID loop more aggressive. The implementation of incremental tuning parameter changes will ensure that the learning agent will learn the right lessons) (Para. [0026]),
and the critic comprising a critic network (e.g., In various aspects, creating a reinforcement learning agent for a PID controller tuning problem can include defining appropriate states, rewards, and actions; specifying a structure for the neural networks used for actor-critic agents and state-action-reward-state-action agents (or SARSA agents)) (actor and critic agent as a learning agent considered to have their network for tuning process) (Para. [0036]);
	and a controller including the PID controller  (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy) (Para. [0026]) comprising a neural network embedded in the actor-critic framework (e.g., In this example we use a neural network architecture known as Deep Reinforcement Learning. The neural network provides a convenient method for the function approximation, and the deep architecture extracts relevant features from the raw data automatically, eliminating the need for feature engineering. The use of Deep Neural Networks with Reinforcement Learning is known as Deep Reinforcement Learning or DRL. FIG. 6 illustrates the continuous states and actions for a DRL agent that has been tested on a number of different simulated complex processes) (Para. [0060]).
	Badgewell does not specifically teach and which is tuned according to reinforcement learning based tuning including anti- windup tuning.  
	Kumar teaches and which is tuned according to reinforcement learning based tuning including anti- windup tuning (e.g., However, the overall system becomes much more complicated, a typical method to deal with the integrator windup problem is used to tune the and to add an anti-windup compensator to prevent the degradation of performance [2]. Basically in compensator tuning, two different approaches: Conditional integration and back-calculation are used [4]. It is proposed to combine the different approaches in order to overcome these problems) (Section I: Introduction, Col. 2, Ln. 14-24).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having teachings of Badgewell and Kumar before him/her, to modify the teachings of Badgewell to include the teachings of Kumar with the motivation to prevent the degradation of performance (Kumar: Page. 1, Col. 2, Ln.17-19).
The combination of Badgewell and Kumar does not specifically teach wherein the PID gains are weights of the actor network, the weights of the actor network initializes the PID gains on each parameter of the PID controller; [and the critic comprising a critic network] including at least one function associated with the actor.
Xu teaches wherein the PID gains are weights of the actor network (e.g., The controller has an actor-critic learning control module to automatically tune the PID gains. The learning control module has a critic neural network and two actor networks and the adaptive heuristic critic (AHC) algorithm is used to update the weights of the networks) (Page. 1653), the weights of the actor network initializes the PID gains on each parameter of the PID controller (e.g., In Fig.2, Yd denotes the desired outputs, Y is the output vector of the vehicle and Yr is the output vector of the reference model. The RL module in Fig2 has an actor-critic learning control architecture, which was early studied in [8} and has been successfully applied in several difficult the RL module, which are the actor network, the critic network and the reward function, respectively. The actor network is used to tune the PID gains online by its output AK ) (Page. 1654, Fig. 2);

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having teachings of Badgewell, Kumar and Xu before him/her, to modify the teachings of Badgewell, and Kumar to include the PID gains and actor weight teachings of Xu with the motivation to utilize the robustness of PID control and the optimization ability of learning control (Xu: Page. 1653).
7.	Regarding claim 2, the combination of Badgewell, Kumar, and Xu teaches the system of claim 1 wherein Kumar further teaches the controller comprises parameters that include an anti-windup parameter  (e.g., The proportional derivative part (if the manipulated variable adopted to generate the anti-windup feedback signal) ( Page. 2, Col. 2, Ln. 3-5).  
8.	Regarding claim 3, the combination of Badgewell, Kumar, and Xu teaches the system of claim 1 wherein Badgewell further teaches the controller allows for constraining of individual parameters (e.g., In some aspects, the controller tuning parameters can include one or more of a proportional tuning parameter; a gain parameter; an integral time parameter; an integral tuning parameter; a derivative time parameter; and a derivative tuning parameter. For example, the controller tuning parameters can include at least one of a proportional tuning parameter and a gain parameter, at least one of an integral tuning parameter and an integral time parameter, and optionally at least one of a derivative tuning parameter and a derivative time parameter) (Para. [0009]).  
9.	Regarding claim 4, the combination of Badgewell, Kumar, and Xu teaches the system of claim 1 wherein Badgewell further teaches wherein the actor network is initialized with gains, 
10.	Regarding claim 5, the combination of Badgewell, Kumar, and Xu teaches the system of claim 1 wherein Badgewell further teaches the controller comprises a PID (Proportional Integral Derivative) controller (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can 
11.	Regarding claim 6, the combination of Badgewell, Kumar, and Xu teaches teaches the system of claim 1 wherein Badgewell further teaches wherein weights associated with the actor are initialized with selected PID gains (e.g., In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral-Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy. Tuning changes are implemented as incremental changes to the existing tuning parameters so that the tuning policy can generalize more easily to a wide range of PID loops. The implementation of incremental tuning parameters is important to avoid the implementation of over aggressive changes. For example, a sluggish PID control loop with a controller gain of 5.0. After a few experiments a control engineer might learn that increasing the gain to 10.0 provides acceptable closed-loop behavior. The engineer might conclude, incorrectly, that a controller gain of 10.0 is the best value for all PID loops. The correct conclusion, however, is that doubling the controller gain will make any PID loop more aggressive. The implementation of incremental tuning parameter changes will ensure that the learning agent will learn the right lessons) (Para, [0026]).
12.	Regarding claim 7, the combination of Badgewell, Kumar, and Xu teaches the system of claim 5 wherein Kumar further teaches the PID controller comprises a (Proportional- Derivative) 
13.	Regarding claim 8, the combination of Badgewell, Kumar, and Xu teaches teaches the system of claim 5 wherein Badgewell further teaches the PID controller comprises an integral portion (e.g., In this discussion, it is understood that references to a PID controller also include proportional-integral (PI) controllers that operate using only a proportional term and an integral term) (Para. [0029]).  
14.	Regarding claim 9, the combination of Badgewell, Kumar, and Xu teaches the system of claim 5 wherein Kumar further teaches the PID controller comprises a PD (Proportional- Derivative) portion (e.g., The proportional derivative part (if the manipulated variable adopted to generate the anti-windup feedback signal) (Page. 2, Col. 2, Ln. 3-5) and an integral portion (Refer to Fig.5 control variable and integral term with conditional integration and back-calculation).  
15.	Regarding claim 10, Claim 10 recites a system, with substantially the same limitations as system claim 1. Therefore the rejection applied to claim 1 also applies to claims 10. Badgewell further disclsoe comprising: at least one processor; and a non-transitory computer-usable medium embodying computer program code, said computer-usable medium capable of communicating with said at least one processor, said computer program code comprising instructions executable by said at least one processor and configured for (e.g., The first process controller can further include a processor having an associated memory containing executable instructions that, when executed, provide a method according to any of the aspects described above) (Para. [0013]).
Regarding claim 11-15, as to claim 11-15, applicant is directed to the citation for claim 2-6 above. 
17.     Regarding claim 16-19, Claims 16-19 recites a method that implement the system of claims 1-4, with substantially the same limitations, respectively. Therefore the rejection applied to claims 1-4 also applies to claims 16-19.
18.	Regarding claim 20, as to claim 20, applicant is directed to the citation for claim 6 above. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIGNESHKUMAR C PATEL whose telephone number is (571)270-0698.  The examiner can normally be reached on Monday - Friday, 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kenneth M Lo can be reached on (571)272-9774.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 






/JIGNESHKUMAR C PATEL/Primary Examiner, Art Unit 2116