DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The amendment to the claims overcame the rejections under 35 U.S.C. 101, made in the previous Office Action.

Allowable Subject Matter
Claim 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Amendment
Applicant’s arguments with, filed 10/14/2022,  have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 – 8, 10, 11, 13, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al. “Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings” from “IEEE TRANSACTIONS ON SMART GRID, VOL. XX, NO. XX, MONTH 2020” published on 22 JULY 2020 (hereinafter Yu) in view of TARBELL et al. US 2021/0095898 (hereinafter TARBELL).
Regarding claim 1, Yu teaches: a method for controlling an energy management system (EMS) that is performed by a computing device including at least one processor, the method comprising:
acquiring a target temperature of one or more target points (Page 3 - - “while Timin and Timax denote the minimum and maximum acceptable indoor temperature at zone i respectively);
controlling one or more control variables using a reinforcement learning control model (Page 5, - - deep reinforcement learning (DRL) based HVAC control algorithm)  trained for a first condition regarding a state before a current temperature of the target points converges to the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image1.png
    94
    1005
    media_image1.png
    Greyscale
; when temperature is outside of the acceptable range, ri,3,t(st) is based on temperature difference between the actual temperature and acceptable range; this is a first condition); and
controlling the one or more control variables using the reinforcement learning control model (Page 5, - - deep reinforcement learning (DRL) based HVAC control algorithm) trained for a second condition regarding a state after the current temperature of the target points converges to the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image1.png
    94
    1005
    media_image1.png
    Greyscale
; when temperature is within the acceptable range, ri,3,t(st) = 0; this is a second condition  ), 
wherein the reinforcement learning control model is trained based on a reward that is calculated differently for the first condition and the second condition respectively (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, thus the reward is different in 1st condition and 2nd condition).

But Yu does not explicitly teach: 
acquiring a target indirect indicator corresponding to the acquired target temperature;
wherein the target indirect indicator includes a value obtained through at least one sensor from the environment, when the current temperature of the target points converges towards the target temperature, based on the reinforcement learning control model controlling one or more control variables, the reinforcement learning control model being trained to control one or more control variables based on a state information.

However, TARBELL teaches:
acquiring a target indirect indicator corresponding to the acquired target temperature ([0040] - - once the DAT remains within the acceptable temperature band, a satisfactory SST setpoint is achieved and this optimal SST setpoint is stored in memory; the SST setpoint is a target indirect indicator; the DAT acceptable temperature band is the acquired target temperature);
wherein the target indirect indicator includes a value obtained through at least one sensor from the environment, when the current temperature of the target points converges towards the target temperature, based on the reinforcement learning control model controlling one or more control variables, the reinforcement learning control model being trained to control one or more control variables based on a state information ([0012] - - evaporator suction pressure sensor; [0040] - - once the DAT remains within the acceptable temperature band, a satisfactory SST setpoint is achieved and this optimal SST setpoint is stored in memory).

Yu and TARBELL are analogous art because they are from the same field of endeavor.  They all relate to HVAC control system.

Therefore before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the above method, as taught by Yu, and incorporating determining optimal SST setpoint, as taught by TARBELL.  

One of ordinary skill in the art would have been motivated to do this modification in order to reduce the amount of time required to reach target temperature, as suggested by TARBELL ([0041]).

Claim 13 is substantially similar to claim 1 and is rejected for the same reasons and rationale as above.

Claim 14 is substantially similar to claim 1 and is rejected for the same reasons and rationale as above.

Regarding claim 2, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the reinforcement learning control model comprises:
a first control agent trained for controlling a first control variable (Fig.2, Fig. 3, Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM” – AHU agent is a first agent, it controls the cooling coil); and
a second control agent trained for controlling a second control variable (Fig.2, Fig. 3, Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM” – zone agent is a second agent, it controls a VAV terminal box; Page 4 - -  “damper position in the VAV terminal box”, thus there is a damper is the VAV box of each zone).

Regarding claim 3, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: 
the first control variable and the second control variable are dependent on each other (Fig.2, Fig. 3, Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM” - - cooling coil and VAV damper are dependent on each other to control the zone temperature),
the first control variable is an output of a compressor (Page 3 - - AHU includes a cooling coil, thus it controls a compressor to cool the cooling coil), and the second control variable is a degree of opening and closing of a valve (Page 4 - -  “damper position in the VAV terminal box”, thus there is a damper is the VAV box of each zone, the damper is considered a valve).
the reinforcement learning control model separately controls the output of the compressor and the degree of opening and closing of the valve (Fig.2, Fig. 3, Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM” - - AHU agent controls the cooling coil; zone agent controls the VAV terminal box; thus the output of the compressor and damper are controlled separately).

Regarding claim 4, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the reinforcement learning control model includes an artificial neural network layer including at least one node (Page 6-7, section “III. MADRL-BASED HVAC CONTROL ALGORITHM”; MADRL using neural network which ( including nodes), and
wherein a training method of the reinforcement leaning control model comprises (Page 8 - - Algorithm 1: Training Algorithm):
acquiring state information from an environment including at least one sensor, by the reinforcement learning control model (Page 8 - - Algorithm 1: Training Algorithm, get initial observation state);
controlling the one or more control variables based on the state information, by the reinforcement learning control model (Page 8 - - Algorithm 1: Training Algorithm, select and send actions);
acquiring the state information updated from the environment as a result of controlling the control variables, by the reinforcement learning control model (Page 8 - - Algorithm 1: Training Algorithm, sample mini-batch B with transitions (o,a,Õ,r) ); and
training the reinforcement learning control model based on the acquired reward from the environment as the result of controlling the control variables (Page 8 - - Algorithm 1: Training Algorithm).

Regarding claim 5, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: a reward computed based on the current temperature of the target points and the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is computed based on whether temperature is within the acceptable range);
a reward computed based on total amount of work (Page 5-6, section “3) Reward function” - - the sum of ri,1,t and ri,2,t  is total energy consumption, thus it is computed based on total amount of work); or
a reward computed based on a current indirect indicator and a target indirect indicator (Page 5-6, section “3) Reward function” - - the reward for CO2 concentration violation ri,4,t is computed based on a current indirect indicator and a target indirect indicator).

Regarding claim 6, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: state information that the reinforcement learning control model acquires from the environment is first state information that includes at least one of state data on temperature, state data on an output of a compressor, and state data on a degree of opening and closing of a valve (Page 3, - - Let Ti,t be the indoor temperature of zone i at slot t; page 4 - - “damper position in the VAV terminal box”; damper position is a degree of opening of a valve).

Regarding claim 7, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches:
training a first control agent comprised in the reinforcement learning control model, based on a reward computed based on the current temperature of the target points and the target temperature (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, Thus this reward is computed based on the current temperature and target temperature; a 1st agent - AHU agent is trained using this reward); and
training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
the sum of ri,1,t and ri,2,t  is total energy consumption; a 2nd agent – zone agent is trained using this reward).

Regarding claim 8, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches:
training a first control agent comprised in the reinforcement learning control model, based on the current temperature of the target points, the target temperature, and total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
ri,3,t(st) is based on whether temperature is within the acceptable range, Thus this reward is computed based on the current temperature and target temperature; the sum of ri,1,t and ri,2,t  is total energy consumption; a 1st agent - AHU agent is trained using this reward); and
training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on total amount of work (Page 5-6, section “3) Reward function” - - 
    PNG
    media_image2.png
    53
    992
    media_image2.png
    Greyscale
 
the sum of ri,1,t and ri,2,t  is total energy consumption; a 2nd agent – zone agent is trained using this reward).

Regarding claim 10, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: the target indirect indicator includes a pre-determined value according to the target temperature ((Page 5-6, section “3) Reward function” - - the reward for CO2 concentration violation ri,4,t; the Oimax is interpreted as a target indirect indicator, it corresponds from the target temperature Timin and Timax since they are all for zone i)

Regarding claim 11, the combination of Yu and TARBELL teaches all the limitations of the base claims as outlined above. 

Yu further teaches: second state information additionally including state data for an indirect indicator to first state information that includes at least one of state data on temperature, state data on an output of a compressor, and state data on a degree of opening and closing of a valve (page 4 - - “damper position in the VAV terminal box”; damper position is a degree of opening of a valve).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUHUI R PAN whose telephone number is (571)272-9872. The examiner can normally be reached Monday-Friday 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kenneth M Lo can be reached on (571) 272-9774. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YUHUI R PAN/Primary Examiner, Art Unit 2116