DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after December 9, 2016, is being examined under the first inventor to file provisions of the AIA .
A RCE was filed on September 6, 2022 re-opening prosecution.  
In the Amendment filed on September 6, 2022, claims 1, 3, 11, 12, and 20 were amended.  Claims 1-20 are currently pending and under examination, of which claims 1, 11, and 20 are independent claims. 

Response to Amendment
The IDSs filed on 09/06/2022 and 09/07/2022 have been considered and entered.

Response to Arguments
Independent claim 1 has been further amended to recite “determining by the processing unit a value of a reinforcement signal based on the at least one current environmental characteristic value, the at least one set point, the at least one updated environmental characteristic value and the set of rules, the value of the reinforcement signal being one of positive or negative reinforcement”. (emphasis added)
To defend the amended feature, on page 11 of the Amendment the following is argued:
With respect to Barret, Figure 4B and paragraphs [0036] to [0051] disclose an embodiment of a reinforced learning component. The computation of a reward for the reinforced learning component is disclosed at paragraph [0045]. The computation of the reward takes into consideration a current temperature “currentTemp” determined at a next state s’ after Application No. 15/914,610Amendments dated September 6, 2022Reply to Office Action dated April 6, 2022performing an action “a” for transitioning from state s to state s’ (paragraphs [0036]-[0038]). The currentTemp recited in Barret is interpreted by Applicant as the at least one updated environmental characteristic value recited in independent claim 1. The computation of the reward also takes into consideration a setpoint “setPoint” determined at the state s before performing the action “a” (paragraph [0036]). The computation of the reward further takes into consideration the action “a”. The computation of the reward also takes into consideration an indicator of the room being occupied “isRoomOccupied”. Barret does not disclose when this indicator is determined: at state s or state s’. However, the computation of the reward does not take into consideration a value of the temperature determined at state s before action “a” is performed, which corresponds to the at least one current environmental characteristic value recited in independent claim 1. 

With respect to the argument that the computation of the reward takes into consideration a current temperature “currentTemp” determined at a next state s’ after performing an action “a” for transitioning from state s to state s’ (paragraphs [0036]-[0038]) and; thus, the currentTemp recited in Barret is interpreted by Applicant as the at least one updated environmental characteristic value recited in independent claim 1.  The Office respectfully disagrees with such interpretation.  Indeed, as explained in the previous Office Action, Barrett describes in a next state s’ currentTemp to be “at least one updated environmental characteristic value” as recited in independent claim 1.  However, paragraph [0036] of Barrett clearly submits that “FIG. 4B illustrates one embodiment of a reinforced learning component for an isolated (non-networked) unit…In this embodiment, the component (and/or algorithm) initializes its learning parameters, either with pre-loaded information or initialized with arbitrary values 402… The unit begins in a state s that may include such things as the current time, the current temperature, a setpoint temperature, a setpoint humidity, a time to the setpoint, if the heat is on, if the heat is off, the MOS of occupants, and/or the like: % A state space s {time, temperature, humidity, sp temperature, sp % humidity, time to temperature, heat on, heat off, MOS} s={10:05, 19, 38, 21, 40, 10, false, false, 3.5}”. (emphasis added)  The current temperature and/or humidity at state s clearly relies on “determining by the processing unit a value of the reinforcement signal based on the at least one current environment characteristic value…”, as recited in amended independent claim 1.
Contrary to the contentions made in the referred portion of the Amendment, the computation of the reward does take into consideration a value of the temperature determined at state s before action “a” is performed, which corresponds to “the at least one current environmental characteristic value”, as recited in amended independent claim 1.
On page 12 of the Amendment, the following is argued:
Thus, the calculation of the reward in Barret is based on a single value of a temperature determined before performing an action “a” for transitioning from state s to state s’. By contrast, independent claim 1 recites the determination of a reinforcement signal taking into consideration the at least one current environmental characteristic value (e.g. a current temperature) received before determining one or more commands for controlling an appliance AND the at least one updated environmental characteristic value (e.g. an updated temperature) received after transmission of the command to the environment controller. 

However, as previously explained and further explained in paragraphs [0032], [0036]-[0038], and [0045] of Barrett, the calculation of the reward is not simply based on a single value of a temperature.  Paragraph [0036] of Barrett further explains that “In this embodiment, the component (and/or algorithm) initializes its learning parameters, either with pre-loaded information or initialized with arbitrary values 402.  It can repeat until it has met a termination condition, which may be a time limit, a number of iterations, a comfort level, a user-defined stopping point, a too-small differentiation between probability sets and/or the like 404.”  Thus, the calculation of the reward in Barrett is based on the current temperature, the set point, a current temperature of the next state, and a policy, which teach “determining by the processing unit a value of a reinforcement signal based on the at least one current environmental characteristic value, the at least one set point, the at least one updated environmental characteristic value and the set of rules”, as recited in amended independent claim 1.
For the particular reasons set forth above, the rejections of independent claim 1 and related dependent claims are maintained.  For similar reasons as those presented above, the rejection of independent claim 11 and related dependent claims and the rejection of independent claim 20 are maintained.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Montalvo (US Patent Publication No. 2010/0088261 A1) (“Montalvo”), in view of Barrett (US Patent Publication No. 2016/0223218 A1) (“Barrett”), and further in view of Gao et al. (US Patent Publication No. 2020/0050178 A1) (“Gao”).
Regarding independent claim 1, Montalvo teaches:
A method for generating a predictive model of a neural network used for controlling an appliance, the method comprising: Montalvo: Claim 1 (“…method for fully automated energy demand curtailment not requiring human invention…”) Montalvo: Paragraph [0029] (“The apparatus also includes a generator to generate a signal or control data to implement one or more demand reduction actions at least one of an appliance…”) Montalvo: Paragraph [0074] (“…the processor 100 may use, when implementing artificial intelligence methodologies, and desirably in conjunction with use of neural networks…” which reads on “generating a predictive model of a neural network”.)
storing in a memory of a training server a predictive model of a neural network; Montalvo: Paragraph [0060] (“Referring to FIG. 3, in one exemplary embodiment, the DR server 14 may include a processor 100, a memory 102, a communications network interface device 108…”) Montalvo: Paragraph [0074] (“…the processor 100 may use, when implementing artificial intelligence methodologies, and desirably in conjunction with use of neural networks, to order demand reduction actions in a hierarchy or hierarchies to minimize undesirable impact at an end user.”) Montalvo: Paragraph [0077] (“Also, the instructions 104 may include instructions that the processor 100 may execute to generate demand reduction action signals to be transmitted by the communications device 108. These demand reduction action signals may be transmitted to (i) a DR client 16 to provide for control operation of appliances at an end user by the DR client 16, and (ii) a SES device 18 to provide for control of generation and supply of supplemental electrical power to the end user from a supplemental energy source, during the course of a DR event.”) [The artificial intelligence methodologies stored in the memory of the ACPLHVAC controller read on “storing in a memory of a training server a predictive model of a neural network”, and the processor reads on “a training server”.]
…
receiving by a processing unit of the training server at least one room characteristic; Montalvo: Paragraph [0060] [As described above.] Montalvo: Paragraph [0084] (“The DR client 16 may be configured similarly to the DR server 14, with a processor 130 and a memory 132 containing instructions 134 and data 136.”) Montalvo: Paragraph [0093] (“The monitor 200 is a device that may be connected to the appliances 170, and … electronic temperature and relative humidity sensors located within an interior and exterior to a facility of the end user. The monitor 200 may generate or collect data representative of environmental information available at the sensors, and also of the operating status of appliances, such as whether the appliance is ON or OFF... The monitor 200 transmits the generated or collected data to the DR client 16 in substantially real time, where the DR client 16 converts the received data into monitoring data…”) [The generated or collected data of the environmental information reads on “at least one room characteristic”.]
receiving by the processing unit at least one current environmental characteristic value and at least one set point from an environment controller via a communication interface of the training server; Montalvo: Paragraphs [0060], [0084], and [0093] [As described above.] Montalvo: Paragraph [0085] (“The user input 142, for example, may serve as an interface that permits a human (“operator”) at an end user to modify set points, such as environmental limits and appliance operation limits, or to opt out of participation in a DR event prior to and/or during a DR event.”) Montalvo: Paragraph [0151] (“…the processor 100 may evaluate monitoring data representative of temperature and/or humidity in the interior space of a facility of an end user impacted by demand rolling, so that the demand reduction action may provide that appliances with certain electric loads are reinstated (turned ON) and other appliances already closest to temperature and/or humidity set points agreed upon by the end user are turned OFF, where different spaces or facilities impacted by the demand rolling may have different set points.”) [The generated or collected data received at the processor 100, 130 reads on “receiving by the processing unit at least one current environmental characteristic value”.  The communications network interface device 108 reads on “via a communication interface”.  The set points, such as environmental limits and appliance operation limits reads on “at least one set point from an environmental controller”.]
…
transmitting by the processing unit the one or more commands for controlling the appliance to the environment controller via the communication interface; Montalvo: Paragraph [0132] (“The processor 100, in turn, in substantially real time, processes the feedback monitoring data to confirm the operating status of the appliance 170, and in particular whether or not the appliance has responded to a control signal transmitted by the DR client 16, based on the demand reduction action signal of the DR server, to turn OFF the appliance. If the appliance has not responded, the DR server 14 determines a further demand reduction action, desirably without human intervention.”) [The demand reduction action from the processor to turn off the appliance reads on “transmitting by the processing unit the one or more commands for controlling the appliance”.]
…
executing by the processing unit a neural network training engine… to update the predictive model of the neural network based on: inputs comprising the at least one current environmental characteristic value, the at least one set point, and the at least one room characteristic; Montalvo: Paragraphs [0074], [0077], and [0060] [As described above.] Montalvo: Paragraph [0122] (“In block 308, the processor 100 of the DR server 14 generates, and transmits over the network 20, a demand reduction action signal to implement the demand reduction action(s) determined at block 306.”) [  The processor 100, 130 reads on “executing by the processing unit” and the neural networks reads on “a neural network inference engine”. The monitoring data reads on “inputs”.  The space temperature reads on “the at least one current environmental characteristic value and the temperature setting reads on “the at least one set point”.  The generated or collected data reads on “the at least one room characteristic”.] 
one or more outputs consisting of the one or more commands; and… Montalvo: Paragraphs [0060], [0074], [0077], and [0122] [As described above.] [The transmission of the demand reduction action reads on “one or more outputs consisting of the one or more commands”.]
Montalvo does not expressly teach “storing in the memory a set of rules… determining by the processing unit one or more commands for controlling an appliance based on the at least one current environmental characteristic value, the at least one set point and the at least one room characteristic; …receiving by the processing unit at least one updated environmental characteristic value from the environment controller via the communication interface; determining by the processing unit a value of a reinforcement signal based on the at least one current environmental characteristic value, the at least one set point, the at least one updated environmental characteristic value and the set of rules, the value of the reinforcement signal being one of positive reinforcement or negative reinforcement; and… executing by the processing unit a neural network training engine implementing reinforcement training to update the predictive model of the neural network based on: …the value of the reinforcement signal”.  However, Barrett describes an automated control and parallel learning hvac apparatuses, methods and systems (“ACPLHVAC”) updates real time value function estimates through parallel and reinforcement learning, via ACPLHVAC components, by observing a defined state action space to maximize user Quality of Experience (QoE) and minimize associated energy required with regulating environmental spaces. Barrett describes:
storing in the memory a set of rules; Barrett: Paragraph [0019] (“Depending on the implementation, the selection of data used to set a base policy can be generalized or it can be specific to one or more details of the thermostat, such as geospacial location, geographic features (e.g., based on the location, what is the tree coverage, for example, as determined by satellite imagery), associated HVAC device(s) (e.g., type, capacity, model, etc. of the heating and/or cooling systems/sensors), user demographics, home/commercial implementation, and/or the like.”) Barrett: Paragraph [0124] (“…the ACPLHVAC controller and/or a computer systemization may employ various forms of memory 529.”) [The base policy stored reads on “a set of rules”.]
…
determining by the processing unit one or more commands for controlling an appliance based on the at least one current environmental characteristic value, the at least one set point and the at least one room characteristic; Barrett: Paragraph [0100] (“At the end of each epoch the ACPLHVAC observes the current state of the environment and chooses whether or not to execute an automated HVAC action (turn on or off). Table 1 below illustrates the typical information available, with readings for current temperature, humidity, set points and whether or not the room is booked at that particular time.”) [Whether or not to execute the automated HVAC action reads on “determining…one or more commands”.  The current temperature and humidity reads on “at least one current environmental characteristic value” and whether or not the room is booked reads on “at least one room characteristic”.]  
…
receiving by the processing unit at least one updated environmental characteristic value from the environment controller via the communication interface; Barrett: Paragraph [0038] (“It observes the value of the next state s′ which is generated executing the action 410.  % Observe s′ the next state currentTemp=getTempReading( ) % Returns reading from thermostat sensor…”) Barrett: Paragraph [0087] (“As far as the this implementation of the ACPLHVAC is concerned, the following parameters may be observed when choosing an action…”) Barrett: Paragraph [0088] (“rt: is the room temperature, (source: temperature sensors, unit: degrees Celsius)”) Barrett: Paragraph [0106] (“FIG. 5 shows a block diagram illustrating embodiments of a ACPLHVAC controller.”) Barrett: Paragraph [0112] (“The CPU interacts with memory through instruction passing through conductive and/or transportive conduits (e.g., (printed) electronic and/or optic circuits) to execute stored instructions (i.e., program code) according to conventional data processing techniques. Such instruction passing facilitates communication within the ACPLHVAC controller and beyond through various interfaces.”) [The return reading of the room temperature from the thermostat sensor at the ACPLHVAC in a next state reads on “receiving by the processing unit at least one updated environmental characteristic value”. The communication of the ACPLHVAC controller beyond through various interfaces with the thermostat sensor reads on “from the environment controller via the communication interface”.]
determining by the processing unit a value of a reinforcement signal based on the at least one current environmental characteristic value, the at least one set point, the at least one updated environmental characteristic value and the set of rules, the value of the reinforcement signal being one of positive reinforcement or negative reinforcement; and… Barrett: Paragraph [0032] (“An ACPLHVAC controller may take any of these actions accorded to it in its action space at the end of any defined discrete time interval (“epoch”). Determining which action to choose at the end of each epoch is the fundamental operating procedure of the unit.”) Barrett: Paragraph [0036] (“FIG. 4B illustrates one embodiment of a reinforced learning component for an isolated (non-networked) unit… The unit begins in a state s that may include such things as the current time, the current temperature, a setpoint temperature, a setpoint humidity, a time to the setpoint, if the heat is on, if the heat is off, the MOS of occupants, and/or the like…”) Barrett: Paragraph [0037] (“The unit can choose an action a (in this embodiment the action space is defined as {turn heat on, turn heat off}) to execute using a policy π 406 and executes that action 408.”) Barrett: Paragraph [0038] (“It observes the value of the next state s′ which is generated executing the action 410.  % Observe s′ the next state currentTemp=getTempReading( ) % Returns reading from thermostat sensor…”) Barrett: Paragraph [0045] (“Transitioning to s′ results in a reward (which may be positive or negative) 414.”) [The ACPLHVAC controller reads on “the processing unit” and the reinforced learning component of the ACPLHVAC unit based on the current temperature, the setpoint temperature or the setpoint humidity, the current temperature of the next state, and the policy reads on “based on the at least one current environmental characteristic value, the at least one set point, the at least one updated environmental characteristic value and the set of rules”.  The reward being positive or negative reads on “one of positive reinforcement or negative reinforcement”.]
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Montalvo and Barrett before them, to include storing a set of rules, determine the commands for controlling an appliance, receiving updated environmental characteristic value, and determine a value of a reinforcement signal because the references are in the same field of endeavor as the claimed invention and they are focused a fully automated energy management for end users, including comfort management.
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to do this modification because it would improve efficiency of learning and provides an overall reduction of processing required to provide comfortable and energy-efficient environments to users. Barrett Paragraph [0019]  The combination of the cited references would allow for the ability of the ACPLHVAC to balance state space variable adjustment (such as temperature, humidity, and/or the like) with associated energy costs (such as fuel, equipment wear/tear, and/or the like). Barrett Paragraph [0029]
Montalvo and Barrett do not expressly teach “executing by the processing unit a neural network training engine implementing reinforcement training to update the predictive model of the neural network based on: …the value of the reinforcement signal”.  However, Gao describes methods, systems, apparatus and computer program products for implementing machine learning within control systems. Gao describes:
executing by the processing unit a neural network training engine implementing reinforcement training to update the predictive model of the neural network based on: Gao: Paragraph [0042] (“The machine learning system may include a machine learning model that is a neural network. The machine learning model may be a deep neural network. The neural network may be trained using reinforcement learning based on measured or calculated efficiency of the industrial facility.”) Gao: Paragraph [0062] (“The efficiency management system 100 can train an ensemble of machine learning models 132A-132N using a model training subsystem 160 to predict the resource efficiency of the data center 104 if particular data center settings are adopted. In some cases, the efficiency management system 100 can train a single machine learning model to predict the resource efficiency of the data center if particular data center settings are adopted.”) Gao: Paragraph [0071] (“Each constraint model 112A-112N is a machine learning model, e.g., a deep neural network, that is trained to predict certain values of an operating property of the data center over a period of time if the data center adopts a given input setting.”) Gao: Paragraph [0105] (“…apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.”) [The machine learning system reads on “a neural network training engine” using reinforcement learning to train a neural network reads on “implementing reinforcement training to update… the neural network”.  The machine learning model(s) reads on “the predictive model”.]
inputs comprising the at least one current environmental characteristic value, the at least one set point, and the at least one room characteristic; Gao: Paragraph [0059] (“The efficiency management system 100 can take in, as input, state data 140 representing the current state of the data center (or other industrial facility) 104. This state data can come from sensor readings of sensors in the data center 104 and operating scenarios within the data center 104. The state data may include data such as any one or more of temperatures, power, pump speeds, and set points.”) Gao: Paragraph [0072] (“…constructs a set of setting slates that represent one or more (typically a plurality of) data center setting values that can be set for various parts of the data center given the known operating conditions…”) Gao: Paragraph [0073] (“For example, the efficiency management system 100 may determine the most resource efficient settings for a cooling system of the data center 104. The cooling system may have the following architecture: (1) servers heat up the air on the server floor;…”) [The temperature reads on “current environmental characteristic value” and the set points reads on “the at least one set point”. The operating conditions and/or the cooling system’s architecture reads on “the at least one room characteristic”.]
one or more outputs consisting of the one or more commands; and Gao: Paragraph [0061] (“Once the efficiency management system 100 determines the data center settings 120 that will make the data center 104 more efficient, the efficiency management system 100 provides the updated data center settings 120 to the control system 102… The control system 102 can send the signal to the data center to increase the number of cooling towers that are powered on and functioning in the data center 104.”) Gao: Paragraph [0074] (“To efficiently control the cooling system, the efficiency management system 100 may construct different potential setting slates that include various temperatures…”) [The updated settings to control equipment including the number of cooling towers reads on “outputs consisting of the one or more commands”.]
the value of the reinforcement signal. Gao: Paragraph [0005] (“Generally, in a reinforcement learning training technique, a reward is received and is used to adjust the values of the parameters of the neural network.”) Gao: Paragraph [0070] (“If the efficiency management system 100 determines that a constraint model predicts that the value of a given data center setting will violate a constraint of the data center, the efficiency management system will discard the violating setting.”) [The reward or the violating setting reads on “the value of the reinforcement signal”.]
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Montalvo, Barrett, and Gao before them, to include the claimed features of the execution function because the references are in the same field of endeavor as the claimed invention.
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to do this modification because using a machine learning model that predicts safe, advantageous settings, the system may choose the settings without requiring user input or extensive testing. The system may continually optimize efficiency over time as the operating state or configuration of the industrial facility and its operating conditions change. Gao Paragraph [0046]
Regarding claim 2, this claim incorporates the rejection presented in claim 1.  Montalvo further teaches:
The method of claim 1, wherein the predictive model of the neural network comprises weights of the neural network, and … Montalvo: Paragraph [0130] (“In a further embodiment, the determination of a demand reduction action from a hierarchy is performed using artificial intelligence supplemented by neural networks. For example, the neural networks may apply weightings to demand reduction actions in view of the monitoring data indicating interior space temperature and available daylight, to provide that the determined demand reduction action may implement an energy curtailment objective with minimal undesired impact on the end user.”)
Montalvo does not expressly teach “updating the predictive model comprises updating the weights.”  However, Barrett teaches:
updating the predictive model comprises updating the weights. Barrett: Paragraphs [0047], [0101], and [0102] [As described in claim 1.] Barrett: Paragraph [0029] (“In some instances, the ACPLHVAC may be adjusted by the user or an administrator to give different goals more weight through its reinforced learning reward system. In some instances, user comfort level may be weighted more heavily than energy cost. In some instances, energy costs may be weighted more heavily. In some instances, other factors such as season, environment, demographics of occupancy, and/or the like may be weighted by the system. In some instances, all these weights may be adjustable both prior to and during ACPLHVAC operation.”) [The optimization of the reinforcement learning methods reads on “update the predictive model” and the adjustable weight reads on “updating the weights”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 3, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the determination of the value of the reinforcement signal further takes into consideration the at least one room characteristic.”  However, Barrett teaches:
The method of claim 1, wherein the determination of the value of the reinforcement signal further takes into consideration the at least one room characteristic. Barrett: Paragraph [0036] [As described in claim 1.] [The reinforced learning component considering if the heat is on, if the heat is off, the MOS of occupants, and/or the like reads on “the at least one room characteristic”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 4, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one room characteristic is received from the environment controller via the communication interface of the training server.”  However, Barrett teaches:
The method of claim 1, wherein the at least one room characteristic is received from the environment controller via the communication interface of the training server. Barrett: Paragraph [0021] (“In one instance of the ACPLHVAC, demographic information of occupants may be used to determine shared optimizations across spaces, such as age compatibility. In some instances of the ACPLHVAC, units may be pre-loaded with state space specifics, such as information about the size and shape of the space, buildings of similar specifications, occupancy background and size, historical data and/or the like. In some instances of the ACPLHVAC, units may contain no pre-loaded information.”) [As shown in FIG. 5, the interface bus 507 interfaces between user input devices and/or peripheral devices and the ACPLHVAC controller.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 5, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one room characteristic comprises at least one of the following: a room type identifier selected among a plurality of room type identifiers, one or more geometric characteristics of the room, and a human activity in the room.”  However, Barrett teaches:
The method of claim 1, wherein the at least one room characteristic comprises at least one of the following: a room type identifier selected among a plurality of room type identifiers, one or more geometric characteristics of the room, and a human activity in the room. Barrett: Paragraph [0021] (“In one instance of the ACPLHVAC, demographic information of occupants may be used to determine shared optimizations across spaces, such as age compatibility. In some instances of the ACPLHVAC, units may be pre-loaded with state space specifics, such as information about the size and shape of the space, buildings of similar specifications, occupancy background and size, historical data and/or the like. In some instances of the ACPLHVAC, units may contain no pre-loaded information.”) 
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 6, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one room characteristic comprises a human activity in the room, the human activity in the room comprising at least one of the following: periods of time when the room is occupied by humans, and a type of activity performed by humans occupying the room.”  However, Barrett teaches:
The method of claim 1, wherein the at least one room characteristic comprises a human activity in the room, Barrett: Paragraph [0028] (“This also allows users to move through a number of comfort settings over the course of a day, to reflect how they are feeling in response to activities or stimuli (for example, a user may return to the environment after exercising and prefer cooler conditions. Later in the day, as their internal temperature returns to normal, they desire to warm the environment again). In one embodiment, the ACPLHVAC may automatically adjust itself in response to these inputs in real-time.”) the human activity in the room comprising at least one of the following: 
periods of time when the room is occupied by humans, and Barrett: Paragraph [0036] (“…The unit begins in a state s that may include such things as the current time, … the MOS of occupants, and/or the like.”) Barrett: Paragraph [0098] (“…optimise the number of air changes per hour based on the occupancy of the room…”)
a type of activity performed by humans occupying the room. Barrett: Paragraph [0028] [As described above.] [Exercising reads on “a type of activity”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 7, this claim incorporates the rejection presented in claim 1. Montalvo further teaches: 
The method of claim 1, wherein the at least one current environmental characteristic value comprises at least one of the following: a current temperature, a current humidity level, a current carbon dioxide (CO2) level, and a current room occupancy. Montalvo: Paragraph [0093] (“The monitor 200 is a device that may be connected to the appliances 170, and conventional electronic environmental sensors, such as electronic temperature and relative humidity sensors located within an interior and exterior to a facility of the end user. The monitor 200 may generate or collect data representative of environmental information available at the sensors.”) [The collected data from the temperature and humidity sensors reads on “a current temperature, a current humidity level”.]
Regarding claim 8, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one updated environmental characteristic value comprises at least one of the following: an updated temperature, an updated humidity level, and an updated carbon dioxide (CO2) level.”  However, Barrett teaches:
The method of claim 1, wherein the at least one updated environmental characteristic value comprises at least one of the following: an updated temperature, an updated humidity level, and an updated carbon dioxide (CO2) level. Barrett: Paragraphs [0038] and [0087] [As described in claim 1.] [The return reading of the room temperature from the thermostat sensor at the ACPLHVAC in a next state reads on “an updated temperature”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 9, this claim incorporates the rejection presented in claim 1.  Montalvo further teaches:
The method of claim 1, wherein the at least one set point comprises at least one of the following: a target temperature, a target humidity level, and a target CO2 level. Montalvo: Paragraph [0121] (“…the temperature setpoint of an A/C system to a lower temperature during morning hours to pre-cool interior space of the facility before the afternoon…”)
Regarding claim 10, this claim incorporates the rejection presented in claim 1.  Montalvo further teaches:
The method of claim 1, wherein the one or more commands for controlling the appliance include at least one of the following: 
a command for controlling a speed of a fan, a command for controlling a pressure generated by a compressor, and Montalvo: Paragraph [0119] (“…the demand reduction action may be to reset an existing speed setting of a fan, blower and/or pump of an appliance 170…”)
a command for controlling a rate of an airflow through a valve.
Claims 11-19 recite a training server that is implementing the functions of the method of claims 1, 3, 2, and 5-10 with substantially the same limitations, respectively.  Therefore, the rejections applied to claims 1, 3, 2, and 5-10 above also apply to claims 11-19.
Regarding independent claim 20, Montalvo teaches:
A non-transitory computer program product comprising instructions executable by a processing unit of a training server, the execution of the instructions by the processing unit providing for generating a predictive model of a neural network used for controlling an appliance by: Montalvo: Paragraph [0060] (“Referring to FIG. 3, in one exemplary embodiment, the DR server 14 may include a processor 100, a memory 102, a communications network interface device 108 and other components typically present in a general purpose computer.”) Montalvo: Paragraph [0061] (“The memory 102 stores information accessible by the processor 100, including instructions 104 that may be executed by the processor 100.”) Montalvo: Paragraph [0062] (“The processor 100 may be any well-known processor, such as processors from Intel Corporation or AMD.”) Montalvo: Paragraph [0063] (“The instructions 104 may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor 100. In that regard, the terms “instructions,” “steps” and “programs” may be used interchangeably herein. The instructions may be stored in object code format for direct processing by the processor, or in any other computer language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.”) Montalvo: Paragraph [0074] (“…the processor 100 may use, when implementing artificial intelligence methodologies, and desirably in conjunction with use of neural networks…” which reads on “generating a predictive model of a neural network”.)
The remaining recitations of independent claim 20 are implementing the functions of the method of independent claim 1 with substantially the same limitations.  Therefore, the rejection applied to independent claim 1 above also applies to independent claim 20.
It is noted that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. See MPEP 2123. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ryu, S.H. and Moon, H.J., 2016. Development of an occupancy prediction model using indoor environmental data based on machine learning techniques. Building and Environment, 107, pp.1-9 describes a fundamental objective of energy efficient buildings is to facilitate a comfortable environment for occupants while maintaining minimal energy consumption. Information regarding occupancy in buildings is a key component to achieve this task. Occupant presence and behavior in buildings have significant impact on space heating, cooling and ventilation demand, energy consumption of lighting and appliances, and building controls.
Hafner, R. and Riedmiller, M., 2011. Reinforcement learning in feedback control. Machine learning, 84(1), pp.137-169 describes four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALICIA M. CHOI whose telephone number is (571)272-1473.  The examiner can normally be reached on Monday - Friday 7:30 am to 5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocio Del Mar Perez-Velez can be reached on 571-270-5935.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ALICIA M. CHOI/Patent Examiner, Art Unit 2117