DETAILED ACTION
Non-Final rejection
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Election/Restrictions
In response to restriction, filed 07/07/2022, Claims 15-20 have been withdrawn with traverse.  

 Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim(s) 1-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tousi et al. (Application of SARSA Learning Algorithm for Reactive Power Control in Power System, 2008) in view of Zhang et al. (Load Shedding Scheme with Deep Reinforcement Learning to Improve Short-term Voltage Stability, 2018).

Regarding Claim 1. Tousi teaches a method to control voltage profiles of a power grid, comprising (abstract; fig. 1): 
forming an autonomous voltage control model with one or more agents(agent: fig. 1); 
to coordinating and optimizing reactive power   controllers (LS controllers) to regulate voltage profiles in the power grid with a Markov decision process (MDP) operating with reinforcement learning to control problems in dynamic and stochastic environments (abstract; section II ) . 
Tousi  does not explicitly teach agent as neural networks as Deep Reinforcement Learning (DRL);
training the DRL agents to provide data-driven, real-time and autonomous grid control strategies;
However Zhang teaches agents as neural networks as Deep Reinforcement Learning (DRL) agents (fig. 2; fig. 4);
training the DRL agents to provide data-driven, real-time and autonomous grid control strategies(Deep Reinforcement Learning based Training Process: fig. 1);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to the invention Tousi, agent as neural networks as Deep Reinforcement Learning (DRL);training the DRL agents to provide data-driven, real-time and autonomous grid control strategies, as taught by Zhang, so as to  the increase adaptation and rapid computation time for control voltage recovery and load shedding amount in compact and inexpensive way.

Regarding Claim 2. Zhang further teaches the DRL agents are trained offline by interacting with offline simulations and historical events which are periodically updated (The DRL based under voltage load shedding scheme should be trained firstly in an offline model using time domain simulations: section II; Deep Reinforcement Learning based Training Process:fig.1).  

Regarding Claim 3. Zhang further teaches the DRL agent provides autonomous control actions once abnormal conditions are detected ([abstract]; table 1).  

Regarding Claim 4. Zhang further teaches after an action is taken in the power grid at a current state, the DRL agent receives a reward from the power grid (reward definition: fig. 1; table 1).  

Regarding Claim 5. Zhang further teaches updating a relationship among action, states and reward in the agent's memory (Definition of Reward Function: section II(B)).  

Regarding Claim 6. Tousi further teaches solving a coordinated voltage control problem (abstract; section II-III; fig.1).
Alternately, Zhang further teaches solving a coordinated voltage control problem (SIMULATION RESULTS: section III).  

Regarding Claim 7. Tousi further teaches performing a Markov Decision Process (MDP) that represents a discrete time stochastic control process (page 1199, left column, first paragraph).
Alternately, Zhang further teaches performing a Markov Decision Process (MDP) that represents a discrete time stochastic control process (DDPG: section II).  

Regarding Claim 8. Tousi further teaches further teaches using a 4-tuple to formulate the MDP: (S, A, Pa, R.) where S is a vector of system states, A is a list of actions to be taken, Pa(s, s ')=Pr(st+1=s'I st=s, at=a) represents a transition probability from a current state st to a new state, st+i, after taking an action a at time = t, and Ra(s, s') is a reward received after reaching state s' from a previous state s to quantify control performance (page 1199, left column, first paragraph).
Alternately, Zhang further teaches using a 4-tuple to formulate the MDP: (S, A, Pa, R.) where S is a vector of system states, A is a list of actions to be taken, Pa(s, s ')=Pr(st+1=s'I st=s, at=a) represents a transition probability from a current state st to a new state, st+i, after taking an action a at time = t, and Ra(s, s') is a reward received after reaching state s' from a previous state s to quantify control performance (DDPG: section II; supported by Lillicrap’s Algorithm 1 DDPG algorithm, page 5 ).  

Regarding Claim 9. Zhang further teaches the DRL agent comprises two architecture-identical deep neural networks including a target network and an evaluation network (Actor, critic network: fig. 2 & 4),  

Regarding Claim 10. Zhang further teaches  comprising providing a sub-second control with a phasor measurement unit (PMU) data stream from a wide area measurement system (WAMS) (section II(D)).  
Regarding Claim 11. Zhang further teaches  the DRL agent self-learns by exploring control options in a high dimension by moving out of local optima(section II(C)).  

Regarding Claim 12. Zhang further teaches performing voltage control by the DRL agent by considering multiple control objectives and security constraints (section II(C); fig. 4).  

Regarding Claim 13. Tousi further teaches a reward is determined based on voltage operation zones with voltage profiles, including a normal zone, a violation zone, and a diverged zone (Rewards: section II(V)).  

Regarding Claim 14. Tousi further teache applying a decaying e-greedy method for learning, with a decaying probability of e, to make a random action selection at an i'h iteration, 
    PNG
    media_image1.png
    33
    334
    media_image1.png
    Greyscale
an rd is a constant decay rate (Parameterization:section II(D)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
a) Wang et al. (A Reinforcement Learning Approach to Dynamic Optimization of Load Allocation in AGC System, 2009).
b) Lillicrap et al. (CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING, 2016).
Contact information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-0328. The examiner can normally be reached M-F 9:00 a.m. - 5:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Breene can be reached on 571-272-4107. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

M.K.I
Primary Examiner
Art Unit 2864



/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2864