DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant amendments to claims 1-20 have overcome each and every rejection made under 35 U.S.C 101 previously set forth in the office action mailed on 03/15/2022. Therefore, the rejection is withdrawn.
Applicant’s argument for claims 1,12 and 18 regarding Badgwell et al. failing to teach that the updated policy is optimized to have a highest likelihood of producing  a positive change in performance level of the controller rather than optimized to have a highest likelihood of producing a largest positive magnitude of change in the performance level is fully considered but in moot in view of newly cited reference Paul et al. in combination with prior art of record Badgwell et al. Paul et al. explicitly teaches the performance controller rather being greedy to choose the maximum gain (maximum frequency) for best performance that might cause the controller to overshoot, choose a gain that will produce reasonable boost in the performance of the performance controller without overshooting as taught in [0020]-[0021], [0034] and [0035]. Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of a controller implementing a policy that produces a corresponding performance level in controlling one or more equipment by providing updated control actions taken by the controller in accordance with the updated policy, identifying updated performance level wherein the updated policy for each iteration is determined using associations generated in previous iterations as taught by Badgwell et al. wherein the updated policy is optimized to have a highest likelihood of producing a positive change in performance level of the controller as taught by Paul et al. to get a reasonable boost in controller performance without overshooting the controller. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5,9,12-15 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Badgwell et al. (US 20190187631 A1) in view of Paul et al. (US 20150355692 A1).
	The teachings of Badgwell et al. as disclosed in the previous office action are hereby incorporated by reference to the extent applicable to the amended claims.  	
Regarding claim 1 Badgwell et al. teaches, a method of controlling one or more pieces of equipment using a controller (process PID controller for controlling valve and actuators, [0008]), the method comprising:
the controller controlling one or more pieces of equipment by providing control actions to the one or more pieces of equipment (“...The PID can attempt to control the flowrate relative to a target or setpoint corresponding to the desired flowrate...”, [0032]), wherein the controller implements a policy that produces a corresponding performance level in controlling the one or more pieces of equipment (tuning the PID controller gains (policy) to modify how the PID controller  responds that is sending control actions to control the valve position to reduce the differences between  the measured flow rate and the setpoint flow rate, [0032] and [0033]);
while controlling the one or more pieces of equipment, and during each of a plurality of iterations: updating the policy of the controller (updating the state action value during plurality of actions when reward is received, [0026] and [0042]);
controlling the one or more pieces of equipment by providing updated control actions taken by the controller in accordance with the updated policy (modifying the tuning parameters, determine an action using the updated state action value which corresponds to future actions – control action commands sent by the controller to the valve or to something similar, [0042], [0046] and [0051], see also [0026], [0034] and [0049]); and
identifying an updated performance level of the controller in controlling the one or more pieces of equipment (PID controller’s performance is determined for every updated gain and actions taken by the PID controller, [0033] and [0051]);
associating the updated policy with the updated performance level of the controller in controlling the one or more pieces of equipment (a reward is calculated that corresponds to the combination of state and action that was selected by the PIC controller in addition to modifying the tuning parameters as a response to reduce the difference between the measured setpoint and given setpoint, [0051] and [0026]); and 
wherein for each iteration, the updated policy is determined using the associations generated during one or more previous iterations between the policies and the corresponding performance levels of the controller in controlling the one or more pieces of equipment (based on the determined state, identify the best possible action from a table that has actions and performance level of the PID controller associated with the determined state, [0046], [0040] and [0045]).
Badgwell et al. does not explicitly teach the updated policy is optimized to have a highest likelihood of producing a positive change in the performance level of the controller in controlling the one or more pieces of equipment rather than optimized to have highest likelihood of producing a largest positive magnitude of change in the performance level of the controller in controlling one or more pieces of equipment relative to the previous iteration. However, Badgwell et al. teaches to pick a gain in between maximum and minimum to avoid sluggish or aggressive controller performance as taught in [0026].
Paul et al. teaches, such that the updated policy is optimized to have a highest likelihood of producing a positive change in the performance level of the controller in controlling the one or more pieces of equipment rather than optimized to have highest likelihood of producing a largest positive magnitude of change in the performance level of the controller in controlling one or more pieces of equipment relative to the previous iteration (performance controller determines a gain that is increase in frequency for the CPU and GPU such that it has reasonable increase in performance rather than being greedy by choosing maximum frequency (maximum gain) which might have much higher power consumption for the CPU and GPU making the performance controller unstable, [0020], [0021]and [0047], see also [0032]). 
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of a controller implementing a policy that produces a corresponding performance level in controlling one or more equipment by providing updated control actions taken by the controller in accordance with the updated policy, identifying updated performance level wherein the updated policy for each iteration is determined using associations generated in previous iterations as taught by Badgwell et al. wherein the updated policy is optimized to have a highest likelihood of producing a positive change in performance level of the controller as taught by Paul et al. to get a reasonable boost in controller performance without overshooting the controller. 
Badgwell et al. teach:
[0033] More generally, the PID controller operates on a process, and it is the process that determines how the controlled variable y moves as the manipulated variable u is adjusted. The overall performance of a PID controller operating on a particular process is determined by the values of the tuning parameters K.sub.c, τ.sub.1, and τ.sub.D. In some aspects, the control gain K.sub.c, can have increased importance, as the control gain K.sub.c, determines the aggressiveness of the control action (large magnitude for aggressive control, small magnitude for passive control). It is noted that other functional forms can be used, but typically other functional forms can also include tuning parameters that include a proportional tuning parameter, an integral tuning parameter, and a derivative tuning parameter. In Equation (1) the proportional tuning parameter corresponds to K.sub.c, although it is understood that other functional forms could include another parameter prior to the first term in the brackets, so that the proportional tuning parameter could be based on a combination of K.sub.c and the additional parameter. In Equation (1) the integral tuning parameter can be viewed as either as
τ.sub.1 or as K.sub.c/τ.sub.1. Similarly, depending on a desired method of control, in Equation (1) the derivative tuning parameter can be viewed as either τ.sub.D or as K.sub.c*τ.sub.D.

[0026] In various aspects, systems and methods are provided for using a Deep Reinforcement Learning (DRL) agent to provide adaptive tuning of process controllers, such as Proportional-Integral- Derivative (PID) controllers. The agent can monitor process controller performance, and if unsatisfactory, can attempt to improve it by making incremental changes to the tuning parameters for the process controller. The effect of a tuning change can then be observed by the agent and used to update the agent's process controller tuning policy. Tuning changes are implemented as incremental changes to the existing tuning parameters so that the tuning policy can generalize more easily to a wide range of PID loops. The implementation of incremental tuning parameters is important to avoid the implementation of over aggressive changes. For example, a sluggish PID control loop with a controller gain of 5.0. After a few experiments a control engineer might learn that increasing the gain to 10.0 provides acceptable closed-loop behavior. The engineer might conclude, incorrectly, that a
controller gain of 10.0 is the best value for all PID loops. The correct conclusion, however, is that doubling the controller gain will make any PID loop more aggressive. The implementation of incremental tuning parameter changes will ensure that the learning agent will learn the right lessons.

[0042] When selecting an action based on the current state, or possibly based on a recent number of states within a time period, a reinforcement learning agent can a select an action based on the state-action value function. In a discrete implementation, the state-action value function can be represented by a plurality of state-action values. In a continuous implementation, the plurality of state-action values can correspond to the continuous set of state-action values that can be determined based on the
state-action value function. The state-action value function and/or the state-action values correspond to a function/values accumulated by the agent based on rewards from past actions. When a reward is received, the state-action value function (such as the discrete table of state-action values) can be updated based on the received reward. In some aspects, the state-action values can correspond to a discrete table of combinations of past states and future actions. Optionally, at least some past states
can correspond to a plurality of past states, such as a sequence of past states. Optionally, some future actions can correspond to a plurality of actions, such as a sequence of future actions.

[0046] During operation of a PID controller, a reinforcement learning agent can determine a current state and modify the PID tuning parameters on any convenient schedule. As an example of a possible schedule, the controlled variable for a PID controller may be sampled at a rate, such as three times per second. The PID controller can potentially make changes to the manipulated variable at the same rate (three times per second) or at a different rate, such as a slower rate. The reinforcement learning
agent can accumulate data over a plurality of controlled variable values and/or manipulated variable values in order to determine the state. For example, the reinforcement learning agent can determine the state after every 100 measurements of the controlled variable, or every 200 measurements, or every 500 measurements, or based on any other convenient fixed or variable number of measurements. Additionally or alternately, the determination of the state can be based on a desired number of samplings of the manipulated variable and/or based on an elapsed period of time and/or
based on any other convenient metric. After determining the state, an action can be determined based on the state-action value function (such as looking up a value in a state-action value table in a discrete implementation). An additional number of measurements of the controlled variable/manipulated variable/other metric can then be accumulated prior to the next determination of the state.

[0051] In FIG. 3, the controller input from one or more of detectors 371, 372, or 373 can also be used by a learning agent 350 to modify the tuning parameters for the PID control module 366. For example, the controller input from the one or more of detectors 371, 372, or 373 can be used by state analysis module 369 to determine one or more states that are associated with the current value of the controlled variable and/or the value of the controlled variable over a period of time. Based on the state determined by the state analysis module 369, the learning agent 350 can select an action to perform
based on a stored state-action value function 367. The state-action value function can correspond to a plurality of discrete state-action values, a continuous set of state-action values, or a combination thereof. Based on the selected action, the tuning parameters in control module 366 can be modified, such as by making an incremental change in one or more of the tuning parameters. The modified set of tuning parameters for the proportional, integral, (and optional derivative) terms can then be used by proportional-integral control module 366 for determining the controller output signal to actuator 381
and/or electrical activator 382. At a later point, after one or more additional evaluations of the state by state analysis module 369, a reward can be determined by reward module 368 that corresponds to the combination of state and action that was selected.

[0040] Based on determining a current state, the agent can select an action to perform. The action can be selected based on the value function estimated from various actions performed when in a state corresponding to the current state. This value function is referred to herein as a state-action value function. For a discrete-state, discrete-action example, the state-action value function can be stored as and/or can correspond to a plurality of discrete state-action values in a state-action value table. It is understood that an implicit assumption behind the state-action value function is that the selection of
future actions will also be based on use of the state-action value function. In various aspects, because the agent is tuning a PID controller, the actions selected by the agent can correspond to various types of changes in the tuning parameters for the PID controller. Examples of possible actions can include changing the sign and/or the magnitude of the controller gain parameter and/or changing the controller
linear-response parameter; changing the magnitude of the controller integral time parameter and/or the integral tuning parameter; or changing the magnitude of the controller derivative time parameter and/or the derivative tuning parameter.

	Paul et al. teach:


[0020] Conventional power management techniques boost the DVFS states to maximize use of the total thermal capacity, a concept referred to as greedily allocating the power within the thermal budget. If the maximum temperatures associated with the thermal budget is not reached, power is allocated until maximum CPU and GPU frequencies are reached. However, just because the CPU and GPU could run at their maximum frequency does not mean that they should; in some embodiments, there should be a reasonable return in performance for the increase in frequency and higher power consumption.1

[0021] Rather than using a greedy power allocation algorithm, the performance controller 125 employs frequency sensitivity metrics to provide a measure of the improvement in performance for a unit increase in frequency of the associated core 110, 115. Frequency sensitivity is a time-varying function of the workload of the CPU cores 110 and the GPU cores 115. However, due to performance coupling and thermal coupling of the CPU cores 110 and the GPU cores 115, the workloads cannot be evaluated separately for the homogeneous processing units, but rather, the workloads are evaluated across the heterogeneous cores to account for these dependencies.

Regarding claim 2 combination of Badgwell et al. and Paul et al. teach the method claim 1. In addition, Badgwell et al. teaches, wherein for each iteration, the
updated policy is determined using reinforcement learning based on an advantage function, and wherein the updated policy is based on a sign of the advantage function and not on a magnitude of the advantage function (“..In various aspects, because the agent is tuning a PID controller, the actions selected by the agent can correspond to various types of changes in the tuning parameters for the PID controller. Examples of possible actions can include changing the sign and/or the magnitude of the controller gain parameter and/or changing the controller linear-response parameter; changing the magnitude of the controller integral time parameter and/or the integral tuning parameter;2 or changing the magnitude of the controller derivative time parameter and/or the derivative tuning parameter”, [0040]).
	Regarding claim 3 combination of Badgwell et al. and Paul et al. teach the method claim 1. In addition, Badgwell et al. teaches, controlling the one or more pieces of equipment using the updated policy during each of the plurality of iterations is performed for at least a period of time, wherein the period of time is sufficient to allow a measurable response to the updated control actions taken by the controller in accordance with the updated policy (state analysis module determines one or more states that are associated with the current value of the controlled variable and/or the value of the controlled variable over a period of time. The reinforcement learning agent can then select an action that can be performed3 based on the state determined by the state analysis module, [0042] and [0045], see also [0064]).

	Regarding claim 4 combination of Badgwell et al. and Paul et al. teach the method claim 1. In addition, Badgwell et al. teaches, wherein the controller is a regulatory controller and the updated policy comprises tuning parameters (PID controller generating controller output based on controller tuning parameters, [0014]).

Regarding claim 5 combination of Badgwell et al. and Paul et al. teach the method claim 4. In addition, Badgwell et al. teaches, wherein the tuning parameters comprise one or more of a Proportional (P) gain, an Integral (I) gain and a Derivative (D) gain (PID controller tuning proportional, integral and derivative gains, [0033] and [0029]).
Regarding claim 9 combination of Badgwell et al. and Paul et al. teach the method claim 1. In addition, Badgwell et al. teaches, wherein the one or more pieces of equipment are at least part of an industrial process (PID controllers controlling various equipment such as valves, electrical activator and others all of which are part of complex systems such as industrial systems, [0063]).

	Regarding claim 12 Badgwell et al. teaches, a method of controlling one or more pieces of equipment using a regulatory controller (process PID controller for controlling valve and actuators, [0008]), the regulatory controller configured to regulate the one or more pieces of equipment, the method comprising:
the regulatory controller regulating one or more pieces of equipment by providing control actions to the one or more pieces of equipment (“...The PID can attempt to control the flowrate relative to a target or setpoint corresponding to the desired flowrate...”, [0032]), wherein the regulatory controller uses one or more programmable tuning parameters in regulating the one or more pieces of equipment (tuning the PID controller gains (policy) to modify how the PID controller  responds that is sending control actions to control the valve position to reduce the differences between  the measured flow rate and the setpoint flow rate, [0032] and [0033]);
while controlling the one or more pieces of equipment, and during each of a plurality of iterations: updating one or more of the tuning parameters of the regulatory controller (updating the state action value during plurality of actions when reward is received, [0026] and [0042]);
regulating the one or more pieces of equipment using the one or more updated tuning parameters (modifying the tuning parameters, determine an action using the updated state action value which corresponds to future actions – control action commands sent by the controller to the valve or to something similar, [0042], [0046] and [0051], see also [0026], [0034] and [0049]); and
monitoring a performance of how well the regulatory controller controlled the one or more pieces of equipment using the one or more updated tuning
parameters (PID controller’s performance is determined for every updated gain and actions taken by the PID controller, [0033] and [0051]);
	wherein for each iteration, the one or more updated tuning parameters are determined based at least in part on the performance of how well the regulatory controller performed in controlling the one or more pieces of equipment during one or more previous iterations (based on the determined state, identify the best possible action from a table that has actions and performance level of the PID controller associated with the determined state, [0046], [0040] and [0045]).
Badgwell et al. does not explicitly teach the updated policy is optimized to have a highest likelihood of producing a positive change in the performance level of the controller in controlling the one or more pieces of equipment rather than optimized to have highest likelihood of producing a largest positive magnitude of change in the performance level of the controller in controlling one or more pieces of equipment relative to the previous iteration. However, Badgwell et al. teaches to pick a gain in between maximum and minimum to avoid sluggish or aggressive controller performance as taught in [0026].
Paul et al teaches, such that the updated one or more tuning parameters are optimized to have a highest likelihood of producing a positive change in the performance of how well the regulatory controller controlled the one or more pieces of equipment rather than optimized to have a highest likelihood of producing a largest positive magnitude of change in the performance of how well the regulatory controller controlled the one or more pieces of equipment relative to the immediate previous iteration (performance controller determines a gain that is increase in frequency for the CPU and GPU such that it has reasonable increase in performance rather than being greedy by choosing maximum frequency (maximum gain) which might have much higher power consumption for the CPU and GPU making the performance controller unstable, [0020], [0021]and [0047], see also [0032]). 
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of a controller implementing a policy that produces a corresponding performance level in controlling one or more equipment by providing updated control actions taken by the controller in accordance with the updated policy, identifying updated performance level wherein the updated policy for each iteration is determined using associations generated in previous iterations as taught by Badgwell et al. wherein the updated policy is optimized to have a highest likelihood of producing a positive change in performance level of the controller as taught by Paul et al. to get a reasonable boost in controller performance without overshooting the controller. 
Regarding claim 13 combination of Badgwell et al. and Paul et al. teach the method claim 12. In addition, Badgwell et al. teaches, the updated one or more tuning parameters are determined using reinforcement learning based on an advantage function, and wherein the updated one or more tuning parameters are based on a sign of the advantage function and not on a magnitude of the advantage function (“..In various aspects, because the agent is tuning a PID controller, the actions selected by the agent can correspond to various types of changes in the tuning parameters for the PID controller. Examples of possible actions can include changing the sign and/or the magnitude of the controller gain parameter and/or changing the controller linear-response parameter; changing the magnitude of the controller integral time parameter and/or the integral tuning parameter;4 or changing the magnitude of the controller derivative time parameter and/or the derivative tuning parameter”, [0040]).

Regarding claim 14 combination of Badgwell et al. and Paul et al. teach the method claim 12. In addition, Badgwell et al. teaches, controlling the one or more pieces of equipment at least part of the process using the updated one or more tuning parameters during each of the plurality of iterations is performed for at least a period of time, wherein the period of time is sufficient to allow a measurable response to control actions taken by the regulatory controller in
accordance with the updated one or more tuning parameters (state analysis module determines one or more states that are associated with the current value of the controlled variable and/or the value of the controlled variable over a period of time. The reinforcement learning agent can then select an action that can be performed5 based on the state determined by the state analysis module, [0042] and [0045], see also [0064]).

Regarding claim 15 combination of Badgwell et al. and Paul et al. teach the method claim 12. In addition, Badgwell et al. teaches,	wherein the one or more tuning parameters comprise one or more of a Proportional (P) gain, an Integral (I) gain and a Derivative (D) gain (PID controller tuning proportional, integral and derivative gains, [0033] and [0029]).
	Regarding claim 18 Badgwell et al. teaches, a controller for controlling at least part of a process (PID controller, [0008]), the controller comprising: a memory for storing a policy of the controller; a processor operatively coupled to the memory (computer – executable instructions that are in memory associated with a processor, [0053]);
	the processor configured to provide one or more control actions to control at least part of the process, wherein one or more control actions are based on the policy (“...The PID can attempt to control the flowrate relative to a target or setpoint corresponding to the desired flowrate...”, [0032] and tuning the PID controller gains (policy) to modify how the PID controller responds that is sending control actions to control the valve position to reduce the differences between the measured flow rate and the setpoint flow rate, [0032], [0033] and [0053]);
the processor configured to perform a plurality of iterations, wherein during each iteration the processor: updates the policy of the controller (updating the state action value during plurality of actions when reward is received, [0026] and [0042]);
provides one or more control actions based on the updated policy to control the at least part of the process (modifying the tuning parameters, determine an action using the updated state action value which corresponds to future actions – control action commands sent by the controller to the valve or to something similar, [0042], [0046] and [0051], see also [0026], [0034] and [0049]);
and associates the updated policy with a performance level of the controller in controlling the at least part of the process (a reward is calculated that corresponds to the combination of state and action that was selected by the PID controller in addition to modifying the tuning parameters as a response to reduce the difference between the measured setpoint and given setpoint, [0051] and [0026]); and
wherein for each iteration, the updated policy is determined using the associations generated during one or more previous iterations between the policies and the corresponding performance levels of the controller in controlling the at least part of the process (based on the determined state, identify the best possible action from a table that has actions and performance level of the PID controller associated with the determined state, [0046], [0040] and [0045]).
Badgwell et al. does not explicitly teach the updated policy is optimized to have a highest likelihood of producing a positive change in the performance level of the controller in controlling the one or more pieces of equipment rather than optimized to have highest likelihood of producing a largest positive magnitude of change in the performance level of the controller in controlling one or more pieces of equipment relative to the previous iteration. However Badgwell et al. teaches to pick a gain in between maximum and minimum to avoid sluggish or aggressive controller performance as taught in [0026].
Paul et al. teaches, such that the updated policy is optimized to have a highest likelihood of producing a positive change in the performance level of the controller in controlling the one or more pieces of equipment rather than optimized to have highest likelihood of producing a largest positive magnitude of change in the performance level of the controller in controlling one or more pieces of equipment relative to the previous iteration (performance controller determines a gain that is increase in frequency for the CPU and GPU such that it has reasonable increase in performance rather than being greedy by choosing maximum frequency (maximum gain) which might have much higher power consumption for the CPU and GPU making the performance controller unstable, [0020], [0021]and [0047], see also [0032]). 
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of a controller implementing a policy that produces a corresponding performance level in controlling one or more equipment by providing updated control actions taken by the controller in accordance with the updated policy, identifying updated performance level wherein the updated policy for each iteration is determined using associations generated in previous iterations as taught by Badgwell et al. wherein the updated policy is optimized to have a highest likelihood of producing a positive change in performance level of the controller as taught by Paul et al. to get a reasonable boost in controller performance without overshooting the controller. 
Regarding claim 19 combination of Badgwell et al. and Paul et al. teach the controller of claim 18. In addition Badgwell et al. teaches, wherein the processor is configured to determine the updated policy (a processor having an associated
memory containing executable instructions that, when executed, provide a method for
controlling the manipulated variable for the embodiments of the invention, [0053] and [0077]). 
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Badgwell et al. (US 20190187631 A1) in view of Paul et al. (US 20150355692 A1) and Attarwala (US 20050137721 A1).
Regarding claim 10 combination of Badgwell et al. and Paul et al. teach the method of claim 9. 
Neither in combination nor individually Badgwell et al. and Paul et al. teach the industrial process is a refinery process.
Attarwala teaches, wherein the industrial process comprises a refinery process (Model predictive control is in use in industry for refineries and other process industries, [0006]).
	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the method with adaptively tuned PID controller as taught by combination of Badgwell et al. and Paul et al. to use the controller in a refinery process as taught by Attarwala et al. to effectively control the processes using adaptively tuned PID controller. 
Claims 6-8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Badgwell et al. (US 20190187631 A1) in view of Paul et al. (US 20150355692 A1) and Bengea (US 20170314800 A1).
Regarding claim 6 combination of Badgwell et al. and Paul et al. teach the method of claim 4. 
Neither in combination nor individually Badgwell et al. and Paul et al. teach the one or more pieces of equipment comprises an HVAC actuator. 
Bengea et al. teaches, the one or more pieces of equipment comprises an HVAC actuator (the local controllers modify the actuator positions/values based on sensor data pertaining to the individual HVAC components they control, [0019]).
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the method with PID controller as taught by combination of Badgwell et al. and Paul et al. to control HVAC actuator with the PID controller as taught by Bengea et al. to reduce manual intervention for HVAC applications as taught by Bengea et al. in [0003] and [0004]. 

Regarding claim 7 combination of Badgwell et al., Paul et al. and Bengea et al. teach the method of claim 6. In addition Bengea et al. teaches, wherein the HV AC
actuator comprises a water valve (HVAC components include water valve, [0019]).

Regarding claim 8 combination of Badgwell et al., Paul et al. and Bengea et al. teach the method of claim 6. In addition Bengea et al. teaches, wherein the HV AC
actuator comprises an air damper (HVAC components include air damper,  [0019]).
Regarding claim 16 combination of Badgwell et al. and Paul et al. teach the method of claim 12. 
Neither in combination nor individually Badgwell et al. and Paul et al. teach the regulatory controller is configured to control an HVAC actuator of an HVAC system.
Bengea et al. teaches, is configured to control an HVAC actuator of an HVAC system (Optimal control system and automatically tuning the HVAC system, [0015]).
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the method with PID controller as taught by combination of Badgwell et al. and Paul et al. to control HVAC actuator with the PID controller as taught by Bengea et al. to reduce manual intervention for HVAC applications as taught by Bengea et al. in [0003] and [0004]. 

Claims 11,17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Badgwell et al. (US 20190187631 A1) in view of Paul et al. (US 20150355692 A1) and Maturana (US 10416660 B2).
Regarding claim 11 combination of Badgwell et al. and Paul et al. teach the method of claim 1. 
Neither in combination nor individually Badgwell et al. and Paul et al. teach wherein the controller is an edge controller operatively coupled to a remote server, and wherein the updated policy for each iteration is determined by the remote server and communicated down to the controller before the controller controls the one or more pieces of equipment using the updated policy.
Maturana et al. teaches, wherein the controller is an edge controller operatively coupled to a remote server, and wherein the updated policy for each iteration is determined by the remote server and communicated down to the controller before the controller controls the one or more pieces of equipment using the updated policy (cloud-based control applications can perform remote
decision-making for a controlled industrial system based on data collected in the cloud from the industrial system, and issue control commands to the system via the edge device, col 7 lines 61- 65. Edge device can execute on any suitable hardware platform such as a server, col 12 lines 63- 64).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the method with PID controller implementing updated policy as taught by combination of Badgwell et al. and Paul et al. to wherein the updated policy during each iteration is determined by the remote server and communicated to the edge controllers as taught by Maturana et al. because edge device can easily be scaled to accommodate the large quantities of data generated daily by an industrial enterprise as taught by Maturana, Col.7 lines 37-43).

Regarding claim 17 combination of Badgwell et al. and Paul et al. teach the method of claim 12. 
Neither in combination nor individually Badgwell et al. and Paul et al. teach wherein the regulatory controller is an edge controller operatively coupled to a remote server, and wherein the updated policy for each iteration is determined by the remote server and communicated down to the controller before the controller controls the one or more pieces of equipment using the updated policy.
Maturana et al. teaches, wherein the regulatory controller is an edge controller operatively coupled to a remote server, and wherein the updated policy for each iteration is determined by the remote server and communicated down to the controller before the controller controls the one or more pieces of equipment using the updated policy (cloud-based control applications can perform remote
decision-making for a controlled industrial system based on data collected in the cloud from the industrial system, and issue control commands to the system via the edge device, col 7 lines 61- 65. Edge device can execute on any suitable hardware platform such as a server, col 12 lines 63- 64).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the method with PID controller implementing updated policy as taught by combination of Badgwell et al. and Paul et al. to wherein the updated policy during each iteration is determined by the remote server and communicated to the edge controllers as taught by Maturana et al. because edge device can easily be scaled to accommodate the large quantities of data generated daily by an industrial enterprise as taught by Maturana, Col.7 lines 37-43).

Regarding claim 20 combination of Badgwell et al. and Paul et al. teach the controller of claim 18. 
Neither in combination nor individually Badgwell et al. and Paul et al. teach wherein the processor is configured to: communicate one or more parameters indicative of the performance level of the controller in controlling the at least part of the process to a remote device; and receive the updated policy from the remote device.
Maturana teaches wherein the processor is configured to: communicate one or more parameters indicative of the performance level of the controller in controlling the at least part of the process to a remote device; and receive the updated policy from the remote device (cloud based control applications can perform remote decision-making for a controlled industrial system based on data collected in the cloud from the industrial system, and issue control commands to the system via the edge device, col 7 lines 61-65. Edge device can execute on any suitable hardware platform such as a server, col 12 lines 63-64).
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the method with PID controller implementing updated policy as taught by combination of Badgwell et al. and Paul et al. to wherein the updated policy during each iteration is determined by the remote server and communicated to the edge controllers as taught by Maturana et al. because edge device can easily be scaled to accommodate the large quantities of data generated daily by an industrial enterprise as taught by Maturana, Col.7 lines 37-43).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ansari et al. (US 20030149675 A1) teaches an intuitive learning processing device taking actions based on learned determined state by reinforcement learning. Based on the output the probability distribution of the input (actions) are updated and the nest action will then be selected based on this updated action probability distribution.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANZUMAN SHARMIN whose telephone number is (571)272-7365. The examiner can normally be reached M and Th 7:30am - 3:30pm and Tue 8:00am-12:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THOMAS LEE can be reached on (571)272-3667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ANZUMAN SHARMIN/Examiner, Art Unit 2115                                                                                                                                                                                                        

/THOMAS C LEE/Supervisory Patent Examiner, Art Unit 2115                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Rather than maximum change in magnitude, choosing a gain with highest probability with reasonable boost in controller’s performance. 
        2 Not using absolute values.
        3 Actions can be performed within allocated time. 
        4 Not using absolute values.
        5 Actions can be performed within allocated time.