DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  This is directed towards "a computer program”.  Please note that under the broadest reasonable interpretation of the claims when read in light of the specification, the recited “a computer program” can be software per se.  Software per se is not patent eligible.  See MPEP 2106.03.  Applicant is advised to use “non-transitory computer readable storage medium”. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 – 14 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Yu et al. “Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings” from “IEEE TRANSACTIONS ON SMART GRID, VOL. XX, NO. XX, MONTH 2020” published on 22 JULY 2020 (hereinafter Yu).

Regarding claim 1, Yu teaches: a method for controlling an energy management system (EMS) that is performed by a computing device including at least one processor, the method comprising:
acquiring a target temperature of one or more target points (Page 3 - - “while Timin and Timax denote the minimum and maximum acceptable indoor temperature at zone i respectively);
controlling one or more control variables using a reinforcement learning control model (Page 5, - - deep reinforcement learning (DRL) based HVAC control algorithm)  trained for a first condition regarding a state before a current temperature of the target points converges to the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image1.png
    94
    1005
    media_image1.png
    Greyscale
; when temperature is outside of the acceptable range, ri,3,t(st) is based on temperature difference between the actual temperature and acceptable range; this is a first condition); and
controlling the one or more control variables using the reinforcement learning control model (Page 5, - - deep reinforcement learning (DRL) based HVAC control algorithm) trained for a second condition regarding a state after the current temperature of the target points converges to the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image1.png
    94
    1005
    media_image1.png
    Greyscale
; when temperature is within the acceptable range, ri,3,t(st) = 0; this is a second condition  ), 
wherein the reinforcement learning control model is trained based on a reward that is calculated differently for the first condition and the second condition respectively (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, thus the reward is different in 1st condition and 2nd condition).

Claim 13 is substantially similar to claim 1 and is rejected for the same reasons and rationale as above.

Claim 14 is substantially similar to claim 1 and is rejected for the same reasons and rationale as above.

Regarding claim 2, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the reinforcement learning control model comprises:
a first control agent trained for controlling a first control variable (Fig.2, Fig. 3, Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM” – AHU agent is a first agent, it controls the cooling coil); and
a second control agent trained for controlling a second control variable (Fig.2, Fig. 3, Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM” – zone agent is a second agent, it controls a VAV terminal box; Page 4 - -  “damper position in the VAV terminal box”, thus there is a damper is the VAV box of each zone).

Regarding claim 3, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the first control variable is an output of a compressor (Page 3 - - AHU includes a cooling coil, thus it controls a compressor to cool the cooling coil), and the second control variable is a degree of opening and closing of a valve (Page 4 - -  “damper position in the VAV terminal box”, thus there is a damper is the VAV box of each zone, the damper is considered a valve).

Regarding claim 4, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the reinforcement learning control model includes an artificial neural network layer including at least one node (Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM”; MADRL using neural network which ( including nodes), and
wherein a training method of the reinforcement leaning control model comprises (Page 8 - - Algorithm 1: Training Algorithm):
acquiring state information from an environment including at least one sensor, by the reinforcement learning control model (Page 8 - - Algorithm 1: Training Algorithm, get initial observation state);
controlling the one or more control variables based on the state information, by the reinforcement learning control model (Page 8 - - Algorithm 1: Training Algorithm, select and send actions);
acquiring the state information updated from the environment as a result of controlling the control variables, by the reinforcement learning control model (Page 8 - - Algorithm 1: Training Algorithm, sample mini-batch B with transitions (o,a,Õ,r) ); and
training the reinforcement learning control model based on the acquired reward from the environment as the result of controlling the control variables (Page 8 - - Algorithm 1: Training Algorithm).

Regarding claim 5, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: a reward computed based on the current temperature of the target points and the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is computed based on whether temperature is within the acceptable range);
a reward computed based on total amount of work (Page 5-6, section “3) Reward function” - - the sum of ri,1,t and ri,2,t  is total energy consumption, thus it is computed based on total amount of work); or
a reward computed based on a current indirect indicator and a target indirect indicator (Page 5-6, section “3) Reward function” - - the reward for CO2 concentration violation ri,4,t is computed based on a current indirect indicator and a target indirect indicator).

Regarding claim 6, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: state information that the reinforcement learning control model acquires from the environment is first state information that includes at least one of state data on temperature, state data on an output of a compressor, and state data on a degree of opening and closing of a valve (Page 3, - - Let Ti,t be the indoor temperature of zone i at slot t; page 4 - - “damper position in the VAV terminal box”; damper position is a degree of opening of a valve).

Regarding claim 7, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches:
training a first control agent comprised in the reinforcement learning control model, based on a reward computed based on the current temperature of the target points and the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, Thus this reward is computed based on the current temperature and target temperature; a 1st agent - AHU agent is trained using this reward); and
training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
the sum of ri,1,t and ri,2,t  is total energy consumption; a 2nd agent – zone agent is trained using this reward).

Regarding claim 8, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches:
training a first control agent comprised in the reinforcement learning control model, based on the current temperature of the target points, the target temperature, and total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, Thus this reward is computed based on the current temperature and target temperature; the sum of ri,1,t and ri,2,t  is total energy consumption; a 1st agent - AHU agent is trained using this reward); and
training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
the sum of ri,1,t and ri,2,t  is total energy consumption; a 2nd agent – zone agent is trained using this reward).

Regarding claim 9, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: acquiring a target indirect indicator corresponded from the acquired target temperature ((Page 5-6, section “3) Reward function” - - the reward for CO2 concentration violation ri,4,t; the Oimax is interpreted as a target indirect indicator, it corresponds from the target temperature Timin and Timax since they are all for zone i).

Regarding claim 10, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the target indirect indicator is:
a pre-determined value according to the target temperature ((Page 5-6, section “3) Reward function” - - the reward for CO2 concentration violation ri,4,t; the Oimax is interpreted as a target indirect indicator, it corresponds from the target temperature Timin and Timax since they are all for zone i) or
a value obtained through at least one sensor from the environment, when the current temperature of the target points converges to the target temperature, as a result of the reinforcement learning control model controlling one or more control variables, which is trained to control one or more control variables based on first state information.
 
Regarding claim 11, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches: second state information additionally including state data for an indirect indicator to first state information that includes at least one of state data on temperature, state data on an output of a compressor, and state data on a degree of opening and closing of a valve (page 4 - - “damper position in the VAV terminal box”; damper position is a degree of opening of a valve).

Regarding claim 12, Yu teaches all the limitations of the base claims as outlined above. 

Yu further teaches:
training a first control agent comprised in the reinforcement learning control model, based on a reward computed based on the current temperature of the target points, the target temperature, and total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, Thus this reward is computed based on the current temperature and target temperature; the sum of ri,1,t and ri,2,t  is total energy consumption; a 1st agent - AHU agent is trained using this reward); and
training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on a current indirect indicator and the target indirect indicator (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
the reward for CO2 concentration violation ri,4,t is computed based on a current indirect indicator and a target indirect indicator; a 2nd agent – zone agent i is trained using this reward).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUHUI R PAN whose telephone number is (571)272-9872. The examiner can normally be reached Monday-Friday 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kenneth M Lo can be reached on (571) 272-9774. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YUHUI R PAN/Primary Examiner, Art Unit 2116