DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 5-8, 10-13, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Camilus et al. (US20190378020 -hereinafter Camilus) in view of Kojima (US 20130031036 A1 -hereinafter Kojima).
Regarding Claim 1, Camilus teaches a processor implemented method, comprising: 
obtaining, via one or more hardware processors, input data comprising (i) design specification of one or more controllable electrical equipment installed and operating in a building and (ii) design details of the building associated thereof (see [0014]; Camilus: “the method includes generating, by the one or more processing circuits, a user interface including an input interface element for receiving or defining the building model of the building (ii), a first output interface element indicating the optimal equipment setting (i), and a second output interface element indicating energy savings resulting from operating at the optimal equipment setting (i).”); 
generating, via the one or more hardware processors, a simulation model using the input data (see [0071]; Camilus: “the building energy system 400 can be configured to perform data simulation and pre-training. The building energy system 400 can be configured to simulate energy usage data.” See [0078]; Camilus: “The simulator 416 may be and/or may be similar to the software application EnergyPlus.”); 
training, via the one or more hardware processors, a plurality of deep Reinforcement learning (RL) agents using the generated simulation model (see [0071]; Camilus: “The building energy system 400 can be configured to simulate energy usage data, train a predictive building model based on the simulated energy usage data.” See [0042]; Camilus: “The models can be trained using reinforcement learning and the models can be used to predict building parameters such as setpoint temperature, humidity, illumination, etc.”); 
deploying, via the one or more hardware processors, each (see [0097]; Camilus: “At the end of training, the Q-matrix is a trained model that knows to predict building parameters for a day given the forecasted temperature of the day.” See [0093]; Camilus: “For varying building parameters such as setpoint temperature of a day, the agent can update its knowledge base, a Q-matrix”), wherein each of t(see [0093]; Camilus: “Reinforcement learning may be a deep learning method in which an agent is trained to select actions (e.g., optimal actions) from its current state based on a reward policy.”), wherein during an execution of each of (see [0096]; Camilus: “During training, the Q-learning agent can be rewarded if a setpoint is within ASHRAE recommendation of thermal comfort while penalized if it is not.”); 
triggering, via the one or more hardware processors, each of (see [0094]; Camilus: “The reward can be a function of total energy consumed and comfort of the user. The comfort can be a cost function that quantifies user comfort. The reward may be maximum if the total energy consumed is less and comfort of the user is high. The reward may be minimum if the total energy consumed is high and comfort of the user is low. The Q-matrix can be updated for a pre-determined number of episodes (step 520) such that the agent learns from several sample building parameters and their corresponding energy consumption and user comfort.”); and 
estimating, via the one or more hardware processors, a global optimal control parameter list based on an optimal control parameter associated with each of t(see [0096]; Camilus: “The knowledge base of the agent can be saved as a model and used to predict temperature setpoint (predictions 534) for the following month (e.g., October) based on ambient conditions e.g., outside temperature (e.g. the outdoor weather data 532)”. See [0039]; Camilus: “The building energy system can be configured to perform deep learning to realize energy savings by generating optimal control settings for building equipment.”), wherein the optimal control parameter is learnt by each of (see [0099]; Camilus: “There could be prediction models for occupancy level itself and the output of such models can be feed to Q-learning models to predict the optimal setpoint temperatures.”).	However, Camilus does not explicitly teach: … the plurality of trained deep RL agents; and triggering… each of the plurality of trained deep RL agents, to obtain a portion of the reward function associated with another deep RL agent.
Kojima from the same or similar field of endeavor teaches:
… the plurality of trained deep RL agents (see [0037]; Kojima: “The agents 21 to 23 learn through reinforcement learning whether or not to activate any one of the SON applications and which of the SON applications is to be activated according to the state of the wireless communication network 3”); and 
triggering… the plurality of trained deep RL agents, to obtain a portion of the reward function associated with another deep RL agent (see [0046]; Kojima: “the agents 21 to 23 each determine the “reward” by the state variables of the wireless communication network 3. In a certain embodiment, the values of the state variables of the wireless communication network 3 are calculated by weighting and scalarizing them. When the state variables are used to form the reward in reinforcement learning, the state variables may be referred to as the “reward constituent elements”.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the combination of Camilus to include Kojima’s features of the plurality of trained deep RL agents; and triggering the plurality of trained deep RL agents, to obtain a portion of the reward function associated with another deep RL agent. Doing so would improve the policy so as to maximize the total amount of reward that it finally receives. (Kojima, [0004])

Regarding Claim 2, the combination of Camilus and Kojima teaches all the limitations of claim 1 above, Camilus further teaches wherein the one or more controllable electrical equipment comprises one of one or more heating, ventilation, and air conditioning (HVAC) subsystems, one or more lighting equipment, computing loads systems or combinations thereof (see [0095]; Camilus: “These predicted parameters can be set on HVAC controllers and/or thermostats through software-hardware interactions, e.g., via Modbus/Lontalk.” See [0044]; Camilus: “A BMS can include, for example, a HVAC system, a security system, a lighting system, a fire alerting system, any other system that is capable of managing building functions or devices, or any combination thereof.”).

Regarding Claim 3, the combination of Camilus and Kojima teaches all the limitations of claim 1 above, Camilus further teaches wherein the one or more states comprise at least one of temperature, humidity, one or more ambient parameters, lighting intensity, and occupant density (see [0072]; Camilus: “several operating conditions (e.g., ambient temperature, building occupancy, setpoint temperature, setpoint humidity, etc.)”. See [0043]; Camilus: “the building energy system can also be used to control environmental settings such as humidity, lighting, air quality, acoustics, etc.”).

Regarding Claim 5, the combination of Camilus and Kojima teaches all the limitations of claim 1 above, Camilus further teaches wherein the optimal control parameter comprises at least one of set point temperature, lighting intensity set point, and scheduling information of an associated controllable electrical equipment (see [0008]; Camilus: “the optimal equipment operating setting is at least one of a temperature setpoint, a humidity setpoint, an air quality setting, or a light level setting.”).

Regarding Claim 7, the limitations in this claim is taught by the combination of Camilus and Kojima as discussed connection with claim 2.

Regarding Claim 8, the limitations in this claim is taught by the combination of Camilus and Kojima as discussed connection with claim 3.

Regarding Claim 10, the limitations in this claim is taught by the combination of Camilus and Kojima as discussed connection with claim 5.

Regarding Claim 12, the limitations in this claim is taught by the combination of Camilus and Kojima as discussed connection with claim 2.

Regarding Claim 13, the limitations in this claim is taught by the combination of Camilus and Kojima as discussed connection with claim 3.

Regarding Claim 15, the limitations in this claim is taught by the combination of Camilus and Kojima as discussed connection with claim 5.

Claim(s) 4, 9, and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Camilus in view of Kojima in view of Turney et al (CN 110454874 A -hereinfter Turney -Note: as the machine translation attached).
Regarding Claim 4, the combination of Camilus and Kojima teaches all the limitations of claim 1 above; however, it does not explicitly teach wherein the penalty comprises at least one of thermal discomfort, visual discomfort, and stability or degradation information of an associated controllable electrical equipment.
Turney from the same or similar field of endeavor teaches: wherein the penalty comprises at least one of thermal discomfort, visual discomfort, and stability or degradation information of an associated controllable electrical equipment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the combination of Camilus and Kojima to include Turney’s features of the penalty comprises at least one of thermal discomfort, visual discomfort, and stability or degradation information of an associated controllable electrical equipment. Doing so would determine the current state to drive the indoor air temperature towards the desired temperature to reduce the discomfort of the occupant of building. (Turney, page 3)

Regarding Claim 9, the limitations in this claim is taught by the combination of Camilus, Kojima, and Turney as discussed connection with claim 4.

Regarding Claim 14, the limitations in this claim is taught by the combination of Camilus, Kojima, and Turney as discussed connection with claim 4.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Matsuoka (US 20200167834 A1) discloses reinforcement learning, an agent (e.g., model) can take actions in an environment and learn to maximize rewards and/or minimize penalties that result from such actions.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VI N TRAN whose telephone number is (571)272-1108. The examiner can normally be reached Mon-Fri 7:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ROCIO PEREZ-VELEZ can be reached on 571-270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/V.N.T./Examiner, Art Unit 2117                                                                                                                                                                                                        
/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2117