DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6-9 are rejected under 35 U.S.C. 103 as being unpatentable over Fan et al. (US 20190187635 A1) in view of Ohta et al. (US 20200134891 A1) and Brunstetter (US 20190075687 A1).
Regarding claim 1 Fan et al. teaches, an action optimization device for optimizing an action for controlling an environment in a target space (system having device to control an environmental system for a man-made structure, [0015]), comprising a processor and a memory connected to the processor (processor receiving instructions and data from memory [0085]), wherein the processor is configured to:
acquire environmental data related to a state of the environment in the target space (environmental data captured by sensors that monitor the environment within the man-made structure, [0018]);
train an environment reproduction model (machine learning model, [0041]), when a state of an environment and an action for controlling the environment are input, a correct answer value of an environmental state after the action is output, and store the trained environment reproduction model in the memory (the machine learning model stored on the memory is trained in a supervised manner with known environmental states and actions and afterwards environmental states after some time interval -training set data, [0041], [0055] and [0085]);
	read the trained environment reproduction model stored in the memory (machine learning model stored in the memory, [0085]), and predict a second environmental state corresponding to a first environmental state and a first action by using the trained environment reproduction model read (trained machine learning model predicts the environment in the room for some time in the future based on current state and feedback (action), [0039] and  [0071]-[0073]);
output a result of the exploration (deciding which action to take from a state by employing exploration, [0082]).
	Fan et al. does not teach the detail of performing time/space interpolation on the acquired environmental data according to a preset algorithm and training an exploration model which explore for a action to be taken for the second environmental state and outputting a result of the exploration. However Fan et al. explicitly teaches acquiring data which are used by the machine learning model to predict future environmental states in [0039] and [0071]-[0073], and the controller explores different courses of action to be taken based on the predicted state in [0073]. In Fan et al. one model is performing the combined action of predicting states and determining course of action based on the 
	Ohta et al. teaches, perform time/space interpolation on the acquired environmental data according to a preset algorithm (“...so as to calculate continuous values in terms of time or space, and perform interpolation of environmental data values of the air-conditioned space of discrete values acquired by the environmental data value acquisition unit, so as to calculate continuous values1 in terms of time or space...” [0052] and Fig.6).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of training the environmental reproduction model based on acquired environmental data and using the trained environmental reproduction model to predict state of the target space based on current environmental state and course of action of the target space as taught by Fan et al. where time/space interpolation is performed on the acquired environmental data as taught by Ohta et al. to improve data accuracy for the environmental reproduction model. 
	Neither in combination nor individually Fan et al. and Ohta et al. teach training an exploration model which explore for a action to be taken for the second environmental state and outputting a result of the exploration. However Fan et al. explicitly teaches acquiring data which are used by the machine learning model to predict future 
	Brunstetter teaches, train an exploration model such that an action to
be taken next is output when an environmental state output from the environment reproduction model is input (training the neural network model with set of acquired data (including room temperature and humidity data), [0025] and [0007]), and store the trained exploration model in the memory (system 10 implementing the method including a controller having a processor and memory, [0024]);
	read the trained exploration model stored in the memory, and explore for a second action to be taken for the second environmental state by using the trained exploration model read (the output of the machine learning model is input to the neural network model (exploration model) which is used by the controller determine supply air temperature and operating parameter of the cooling unit (action to be taken) [0012]).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the action optimization device as taught by combination of Fan et al. and Ohta et al. to include a separate trained exploration model as taught by Brunstetter instead of combined environment 
	Fan et al. teach:
[0018] The control system 150 can receive various types of inputs, and from various sources. This includes environmental data 131 captured by sensors 130 that monitor the environment within the man-made structure. Examples include temperature, humidity, pressure and air quality data. Air quality might include the concentration of allergens or of particulates of a certain size. It might also include the detection of certain substances: carbon monoxide, smoke, fragrances, negative ions, or other hazardous or desirable substances. Environmental data 131 can also include lighting levels and lighting color.

[0041] The training module receives 511 a training set for training the machine learning model in a supervised manner. Training sets typically are historical data sets of inputs and corresponding responses. The training set samples the operation of the environmental system, preferably under a wide range of different conditions. FIG. 3A gives some examples of input data 310 that may be used for a training set. The corresponding responses are observations after some time interval2, such as the actual temperature and humidity achieved, energy consumed and cost during the time interval, occupant comfort feedback, etc.

[0055] In typical training 512, a training sample is presented as an input to the machine learning model 153, which then predicts an output for a particular attribute. The difference between the machine learning model's output and the known good output is used by the training module to adjust the values of the parameters (e.g., features, weights, or biases) in the machine learning model 153. This is repeated for many different training samples to improve the performance of the machine learning model 153 until the deviation between prediction and actual response is sufficiently reduced.


[0085] Alternate embodiments are implemented in computer hardware, firmware, software, and/or combinations thereof. Implementations can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions by operating on input data and generating output. Embodiments can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.....
training 510 the machine learning model 153 and inference (operation) 520 of the machine learning model 153. These will be illustrated using an example where the machine learning model learns to predict the environment in rooms (e.g., temperature, humidity, lighting) and the energy consumption/cost based on historical data. The following example will use the term “machine learning model” but it should be understood that this is meant to also include an ensemble of machine learning models.

[0072] Predicted cost of the consumed energy The controller 159 controls 524 the environmental system by using the responses predicted by the machine learning model 153 to make informed decisions.

[0073] FIG. 6 is a block diagram of a control system 150 that uses the machine learning model 153 to evaluate different possible courses of action. In this example, the machine learning model 153 functions as a simulation of the environmental system 110 and the man-made structure with respect to the inputs and responses of interest. The current state 630 of the environment and system are the inputs to the machine learning model 153. For example, the state might include the room temperature being 85 F, humidity being 80%, number of people being 40, outdoor temperature being 95 F, etc. The control system 150 can take different courses of action to affect the environment. For example, the control system can set the temperature, change the fan speed, change the mode of operation, or it can do nothing and keep the current settings.

[0082] To decide which action to take from a state, the control system 150 may employ techniques of exploitation and exploration. Exploitation refers to utilizing known information. For example, a past sample shows that under certain conditions, a particular action was taken, and good results were achieved. The control system may choose to exploit this information, and repeat this action if current conditions are similar to that of the past sample.

	Brunstetter teach:

[0025] With further reference to FIG. 1, the controller 12 acquires data, as indicated at box 22, which may involve applying one or more filters to filter the data to produce a data set which will be used by subsequent processing operations. At operation 24 the collected (and potentially filtered) data set is used via one or more machine learning modules to choose what data is kept for use by neural network models, to train neural network models using selected training algorithms, for example via the Levenberg-Marquardt training algorithm, as well as to decide when to retrain the neural network models and/or to start from scratch with new data. At operation 26, the data may be used in one or more neural network models, as will be described further in the following paragraphs. At operation 28one or more optimization routines may run. This may involve choosing when to run optimization as well as running an interior-point global optimization algorithm using the collected data, as will also be described more fully in 

[0007] In one aspect the present disclosure relates to a system for controlling a supply air temperature adjustment for a cooling unit to optimize operation of the cooling unit with respect to at least one of room air temperature and humidity requirements. The system may comprise a controller for implementing a machine learning module configured to select which portion or portions of acquired data pertaining to operation of the cooling unit will be utilized. The controller may also implement a neural network model which uses information supplied by the machine learning module and learns an operational behavior of the cooling unit, and wherein the machine learning module performs supervised learning and regression for the neural network model, and wherein the neural network model uses information supplied by the machine learning module for generating an output. The controller may also implement an optimization module which receives the output from the neural network model and which implements a global optimization routine, using unit power consumption of to produce a supply air temperature set point for use by the cooling unit which optimizes an operating parameter of the cooling unit. 


[0012] The optimization module receives the output from the neural network model and implements a global optimization routine, using unit power consumption of the cooling unit as the objective function ,to produce a supply air temperature set point for use by the cooling unit which optimizes an operating parameter of the cooling unit.

	Regarding claim 2 combination of Fan et al., Ohta et al. and Brunstetter teach the action optimization device of claim 1. In addition Fan et al. teaches, the processor is further configured to:
predict, based on the explored for second action, a third environmental state corresponding to the second environmental state and the second action by using the trained environment reproduction model3 (To simulate subsequent states4, the control system 150 uses the trained machine learning model 153. When underlying conditions (e.g. weather) are changing, the machine learning model 153 can make predictions on what most likely will be observed as a result of actions taken. Based on these predictions, the control system 150 chooses a policy or action that most likely maximizes the metric of interest... [0081]); and
explore for a third action to be taken for the third environmental state by using the trained exploration model (using exploration to decide which action to take from a state, (trained neural network in view of Brunstetter) [0082] and [0081]).

Regarding claim 6 combination of Fan et al., Ohta et al. and Brunstetter teach the action optimization device according to claim 1. In addition Fan et al. teaches the processor is further configured to: acquire policy data specifying information to be used for at least one processing of training the environment reproduction model, training the exploration model, predicting the second environmental state, and exploring for the second action (the machine learning model works in conjunction with the policy engine to learn to determine certain course of action based on the policy for the predicted state,  [0078] and [0079]).
Fan et al. teach:
[0079] Based on the current state 630, a policy engine 651 determines which polices might be applicable to the current state. This might be done using a rules-based approach, for example. The machine learning model 153 predicts the result of each policy. The different results are evaluated and a course of action is selected 657 and then carried out by the controller 659. A set of metrics is used to evaluate the policies. For example, if the comfort zone is defined as being within a range of temperatures and humidity, then a policy that results in actual temperatures outside the comfort zone for too long when occupants are present is scored poorly. A policy that results in a high volume of occupant complaints is scored poorly. Other example metrics include the 

	Regarding claim 7 combination of Fan et al., Ohta et al. and Brunstetter teach the action optimization device according to claim 1. In addition Fan et al. teaches explore for, as the second action, an action of a group unit for a control target group obtained by grouping a plurality of control targets based on a predetermined
criterion in advance (“To decide which action to take from a state, the control system 150 may employ techniques of exploitation and exploration. Exploitation refers to utilizing known information. For example, a past sample shows that under certain conditions, a particular action was taken, and good results were achieved. The control system may choose to exploit this information, and repeat this action if current conditions are similar to that of the past sample5.”, [0082]), or a series of actions for one or more control targets for realizing a predetermined function (possible course of actions to be taken evaluated for a predicted state, [0078]).
	Regarding claim 8 Fan et al. teaches, an action optimization method for the action optimization device including a processor and a memory connected to the processor to optimize an action for controlling an environment in a target space (system having device to control an environmental system for a man-made structure, [0015] and processor receiving instructions and data from memory [0085])), the method comprising:
acquiring environmental data related to a state of the environment in the target space (environmental data captured by sensors that monitor the environment within the man-made structure, [0018]);
training an environment reproduction model (machine learning model, [0041]), such that, when a state of an environment and an action for controlling the environment are input, a correct answer value of an environmental state after the action is output, and storing the trained environment reproduction model in the memory (the machine learning model stored on the memory is trained in a supervised manner with known environmental states and actions and afterwards environmental states after some time interval -training set data, [0041], [0055] and [0085]);
reading the trained environment reproduction model stored in the memory (machine learning model stored in the memory, [0085]), and predicting a second environmental state corresponding to a first environmental state and a first action by using the trained environment  reproduction model read (trained machine learning model predicts the environment in the room for some time in the future based on current state and feedback (action), [0039] and  [0071]-[0073]);
outputting a result of the exploration (deciding which action to take from a state by employing exploration, [0082]).
Fan et al. does not teach the detail of performing time/space interpolation on the acquired environmental data according to a preset algorithm and training an exploration model which explore for a action to be taken for the second environmental state and outputting a result of the exploration. However Fan et al. explicitly teaches acquiring 
Ohta et al. teaches, performing time/space interpolation on the acquired environmental data according to a preset algorithm (“...so as to calculate continuous values in terms of time or space, and perform interpolation of environmental data values of the air-conditioned space of discrete values acquired by the environmental data value acquisition unit, so as to calculate continuous values in terms of time or space...” [0052] and Fig.6).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of training the environmental reproduction model based on acquired environmental data and using the trained environmental reproduction model to predict state of the target space based on current environmental state and course of action of the target space as taught by Fan et al. where time/space interpolation is performed on the acquired environmental data as taught by Ohta et al. to improve data accuracy for the environmental reproduction model. 

	Brunstetter teaches, training an exploration model such that an action
to be taken next is output when an environmental state output from the environment reproduction model is input (training the neural network model with set of acquired data (including room temperature and humidity data), [0025] and [0007]), and storing the trained exploration model in the memory (system 10 implementing the method including a controller having a processor and memory, [0024]);
reading the trained exploration model stored in the memory, and exploring for a second action to be taken for the second environmental state by using the trained exploration model read (the output of the machine learning model is input to the neural network model (exploration model) which is used by the controller determine supply air temperature and operating parameter of the cooling unit (action to be taken) [0012]).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the action optimization device as taught by combination of Fan et al. and Ohta et al. to include a separate trained exploration model as taught by Brunstetter instead of combined environment reproduction model and exploration model taught in Fan et al. to divide the burden of computation and determining course of action between two models to improve overall system computation and implementation accuracy.  

Regarding claim 9 Fan et al. teaches, A non-transitory tangible computer-readable storage medium having stored thereon a program for optimizing an
action for controlling an environment in a target space (program stored in storage device where the storage device includes magnetic disks, hard disk, removable disks and etc. and the stored program when executed by a processor  is to control an environmental system for a man-made structure, [0015] ), the program comprising instructions for causing a processor to execute:
acquiring environmental data related to a state of the environment in the target space (environmental data captured by sensors that monitor the environment within the man-made structure, [0018]);
training an environment reproduction model (machine learning model, [0041]), such that, when a state of an environment and an action for controlling the environment are input, a correct answer value of an environmental state after the action is output (the machine learning model stored on the memory is trained in a supervised manner with known environmental states and actions and afterwards 
predicting a second environmental state corresponding to a first environmental state and a first action by using the environment reproduction model (trained machine learning model predicts the environment in the room for some time in the future based on current state and feedback (action), [0039] and [0071]-[0073]);
outputting a result of the exploration (deciding which action to take from a state by employing exploration, [0082]).
Fan et al. does not teach the detail of performing time/space interpolation on the acquired environmental data according to a preset algorithm and training an exploration model which explore for a action to be taken for the second environmental state and outputting a result of the exploration. However Fan et al. explicitly teaches acquiring data which are used by the machine learning model to predict future environmental states in [0039] and [0071]-[0073], and the controller explores different courses of action to be taken based on the predicted state in [0073]. In Fan et al. one model is performing the combined action of predicting states and determining course of action based on the predicted instead of two models where one predicts state and another determines course of action based on the predicted state of the other model. But Fan et al. mentioned the ensemble of machine learning models can be used to perform the combined actions in [0039].
	Ohta et al. teaches, performing time/space interpolation on the acquired environmental data according to a preset algorithm (“...so as to calculate continuous values in terms of time or space, and perform interpolation of environmental data values of the air-conditioned space of discrete values acquired by the environmental data value acquisition unit, so as to calculate continuous values in terms of time or space...” [0052] and Fig.6).
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to apply the teachings of training the environmental reproduction model based on acquired environmental data and using the trained environmental reproduction model to predict state of the target space based on current environmental state and course of action of the target space as taught by Fan et al. where time/space interpolation is performed on the acquired environmental data as taught by Ohta et al. to improve data accuracy for the environmental reproduction model. 
	Neither in combination nor individually Fan et al. and Ohta et al. teach training an exploration model which explore for an action to be taken for the second environmental state and outputting a result of the exploration. However Fan et al. explicitly teaches acquiring data which are used by the machine learning model to predict future environmental states in [0039] and [0071]-[0073], and the controller explores different courses of action to be taken based on the predicted state in [0073]. In Fan et al. one model is performing the combined action of predicting states and determining course of action based on the predicted instead of two models where one predicts state and another determines course of action based on the predicted state of the other model. But Fan et al. mentioned the ensemble of machine learning models can be used to perform the combined actions in [0039].
training an exploration model such that an action
to be taken next is output when an environmental state output from the environment reproduction model is input training the neural network model with set of acquired data (including room temperature and humidity data) and the output of the machine learning model is input to the neural network model (exploration model) which is used by the controller to determine supply air temperature and operating parameter of the cooling unit (action to be taken) [0012], [0025] and [0007]);
exploring for a second action to be taken for the second environmental state by using the exploration model (output of the machine learning model is input to the neural network model (exploration model) which is used by the controller to determine supply air temperature and operating parameter of the cooling unit (action to be taken) [0012]).
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the action optimization device as taught by combination of Fan et al. and Ohta et al. to include a separate trained exploration model as taught by Brunstetter instead of combined environment reproduction model and exploration model taught in Fan et al. to divide the burden of computation and determining course of action between two models to improve overall system computation and implementation accuracy.  


Claims 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Fan et al. (US 20190187635 A1) in view of Ohta et al. (US 20200134891 A1) and Brunstetter (US 20190075687 A1) and Chaudhury et al. (US 20190385061 A1).
the processor is further configured to: when predicting a second environmental state corresponding to a first environmental state and a first action by using the trained environment reproduction model (trained machine learning model can predict subsequent state based on what most likely will be observed as a result of action taken, [0081]);
when exploring for a second action to be taken for the second environmental state by using the trained exploration model6 (control system employing exploration to decide which action to take from a state, [0082]), 
Neither in combination not individually Fan et al., Ohta et al. and Brunstetter teach outputting a reward corresponding to the second environmental state based on a preset reward function and updating a training result of the exploration model based on the reward. However Fan et al. explicitly teaches to updated the trained machine learning model continually with newly captured data in [0058].
Chaudhury et al. teaches, output a reward corresponding to the second environmental state based on a preset reward function (model-free updated based on an estimated reward function computed from inverse reinforcement learning, [0022] and [0007], also see [0052]);
update a training result of the exploration model based on the reward (using reinforcement learning to update the model using rewards, state, action and next state, [0022] and [0007]).
Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the action optimization device as taught by combination of Fan et al., Ohta et al. and Brunstetter to output a reward corresponding to a state based on reward function and update the exploration model based on the reward (reinforcement learning) as taught by Chaudhury et al. to properly infer course of action based on predicted state as mentioned by Chaudhury et al. in [0058],
“Once the dynamics model is fully learnt, it provides a differentiable model of the environment through which we can back propagate. Hence, even if only the expert agent's state trajectory is available during learning, we can infer the gradient signal with respect to the corresponding expert action by back-propagating through the learnt dynamics model. Assuming that the dynamics model learnt is accurate enough, this is equivalently to behavior cloning from expert state and action pairs.”

	Chaudhury et al. teach:
[0022] In an embodiment, the present invention provides an approach for learning the optimal policy from optimal demonstration data that includes only optimal states and not actions. To that end, the present invention learns the next state predictor, from the optimal state trajectory data, using a Long Short-Term Memory (LSTM) or Dynamic Boltzmann Machine (DyBM) to predict the next state. The training of an agent is commenced in a closed-loop manner. In the training of the agent, the following steps can be repeated until convergence is reached or a number of performed steps is reached that is less than a threshold (e.g., max_steps): (a) start a model-free update using an estimated reward function computed from inverse reinforcement learning; (b) start gathering state, action, and next state triplets during the model-free exploration and periodically train the dynamics model; and (c) fix the dynamic model weights and train end-to-end using expert demonstration data.

[0052] The model-free inverse reinforcement learning 810 involves a model-free policy(π.sub.mf(a.sub.t|s.sub.t)) 811, an environment 812, a next state predictor The model-free policy 811 and the next state predictor 813 receive a states.sub.t, and the model-free policy 811 outputs an action a.sub.t (to the environment 812) while the next state predictor outputs next state ŝ.sub.t+1 (to the operator 814). The operator 814 receives s.sub.t+1and ŝ.sub.t+1 and outputs reward function r.sub.t.



[0007] According to a further aspect of the present invention, a computer processing system is provided for learning an action policy. The computer processing system includes a memory for storing program code. The computer processing system further includes a processor, operatively coupled to the memory, for running the program code to learn a predictor which predicts a next state using trajectories of expert states. The processor further runs the program code to perform model-free inverse reinforcement learning using rewards estimated by using the predictor to sample environment dynamics including triplets of a state, an action, and a next state. The state in each of the triplets is an expert state. The processor also runs the program code to train, using the sampled environment dynamics as training data, a dynamics model which obtains a pair of the state and the action as an input and outputs, for each next state, state-transition probabilities to provide a trained dynamics model.

	Regarding claim 4 combination of Fan et al., Ohta et al. and Brunstetter teach the action optimization device of claim 1. In addition Fan et al. teaches, explore for an action to be taken by using the environment prediction data for the exploration model (to decide which action to take from a predicted state, the control system with machine learning model employ techniques of exploration, [0081] and [0082]).
	Neither in combination nor individually Fan et al., Ohta et al. and Brunstetter teach performing future prediction using preset time series analysis method.
	Chaudhury et al. teaches, perform future prediction by using a preset timeseries analysis method based on the environmental data to generate environment prediction data (“FIG. 5 is a block diagram showing an exemplary time series predictive model 500 for the next state, in accordance with an embodiment of the present invention. The time series predictive model 500 can be used to learn the next state predictor per block 4107. The time series predictive model 500 can include a LSTM or DyBM for every next state to be predicted. Hence, from state s.sub.t, next stateŝ.sub.t+1 is predicted by LSTM or DyBM 510, and from state s.sub.t+1, next state ŝ.sub.t+2 is predicted by LSTM or DyBM 520.s”, [0038] and [0006]). 
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the action optimization device as taught by combination of Fan et al., Ohta et al. and Brunstetter to perform time series analysis based on environmental data to generate environment prediction data as taught by Chaudhury et al. to help the machine learning models to better understand the causes of trends and systematic patterns related predicted state and course of action to be taken over time. 
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Fan et al. (US 20190187635 A1) in view of Ohta et al. (US 20200134891 A1) and Brunstetter (US 20190075687 A1) and Kumaresan et al. (US 20220019678 A1).
Regarding claim 5 combination of Fan et al., Ohta et al. and Brunstetter teach the action optimization device according to claim 1. In addition Fan et al. teaches, train the environment reproduction model by using the environmental data (Based on environmental data, the machine learning model (environment reproduction model) is trained, [0041] and [0039]).
Neither in combination nor individually Fan et al., Ohta et al. and Brunstetter teach to perform data augmentation on the environmental data based on a random 
	Kumaresan et al. teaches, perform data augmentation on the environmental
data based on a random number (“..It will also be appreciated that the irreversible transform of step 201 can include first augmenting the first data element by applying a unique salt value and subsequently generating a pseudo-random number with the augmented first data element as input seed8, or applying a hash function to the augmented first data element, or any combination of these techniques”, [0054]). 
	Therefore it would have been obvious before the effective filing date of the claimed invention to a person of ordinary skill in the art to modify the action optimization device using pre-processed environmental data for training as taught by combination of Fan et al., Ohta et al. and Brunstetter wherein the pre-processing involves data augmentation based on a random number as taught by Kumaresan et al. for improving deep learning robustness of the machine learning models. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Elbsat et al. (US 20150316907 A1) teaches a building management system including sensors that measure time series values of building variables and a deterministic model generator that uses historical values for the time series of building variables to train a deterministic model that predicts deterministic values 
Constantin et al. (US 20190252079 A1) teaches systems and methods using multiple machine learning models working together to predict a state and determine a course of action based on the predicted state. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANZUMAN SHARMIN whose telephone number is (571)272-7365. The examiner can normally be reached M and Th 7:30am - 3:30pm and Tue 8:00am-12:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THOMAS LEE can be reached on (571)272-3667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For 



/ANZUMAN SHARMIN/Examiner, Art Unit 2115                                                                                                                                                                                                        




/THOMAS C LEE/Supervisory Patent Examiner, Art Unit 2115                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 To calculate continuous values by interpolating acquired environmental data in space or time, a preset algorithm must be used. Calculation is not possible without an equation or algorithm. 
        2 Observing after affects of an action taken.
        3 Fan et al. teaches one model performing predicting states and based on predicted state explore course of action to be taken. Brunstetter teaches one machine learning model predicted state (output) is used as an input of another trained neural network model in conjunction to a controller to determine course of action to be taken as taught in [0007] and [0012].
        4 Determining/predicting subsequent states in future.
        5 Someone of ordinary skill in the art can train the machine learning model to apply the course of action taken for one space inside a man-made structure ([0015]) to another space/spaces inside the man-made structure having same or similar environmental dynamics to yield predictable results. MPEP.2143.I. (D). 
        6 Fan et al. teaches one model performing predicting states and based on predicted state explore course of action to be taken. Brunstetter teaches one machine learning model predicted state (output) is used as an input of another trained neural network model in conjunction to a controller to determine course of action to be taken as taught in [0007] and [0012].
        7 Preset time series analysis to predict next environmental state in view of [0006] of Chaudhury et al. 
        8 Augmented first data element is the pre-processed environmental data used for training in view of Fan et al.