DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after December 9, 2016, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are currently pending and under examination, of which claims 1, 11, and 20 are independent claims. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/07/2018 complies with the provisions of 37 CFR 1.97. Accordingly, the Examiner is considering the references in the IDS with a signed and initialed copy being attached hereto.

Drawings
The drawings are objected to under 37 CFR 1.83(a) because there are extraneous vertical lines in FIG. 7A (below step 530 in FIG. 7A) that does not connect to any other item or step in the figure, and is not associated with a reference number..  Any detail that is essential for a proper understanding of the disclosed invention should be shown in the drawing. MPEP § 608.02(d). Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure 

Abstract
The abstract of the disclosure is objected to because the form and legal phraseology often used in patent claims, such as "consisting of", should be avoided.  
Appropriate correction is required.

35 USC § 112(f) Analysis
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
Claim 11 is interpreted under 35 U.S.C. 112(f), as reciting means for performing a specified function.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification, as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Independent claim 11 uses a generic placeholder, “a processing unit” that is coupled with functional language (i.e., “for receiving… receiving… determining... transmitting…receiving…determining…executing…”) without reciting sufficient structure to perform the recited function, and the generic placeholder is not preceded by a structural modifier. For purposes of examination and in accord with Paragraph [0034] of the Specification, as published, “processing unit” will be construed as “one or more processors”.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Montalvo (US Patent Publication No. 2010/0088261 A1) (“Montalvo”) in view of Barrett (US Patent Publication No. 2016/0223218 A1) (“Barrett”).
Regarding independent claim 1, Montalvo teaches:
A method for generating a predictive model for controlling an appliance, the method comprising: Montalvo: Claim 1 (“…method for fully automated energy demand curtailment not requiring human invention…”) Montalvo: Paragraph [0029] (“The apparatus also includes a generator to generate a signal or control data to implement one or more demand reduction actions at least one of an appliance…”) Montalvo: Paragraph [0074] (“…implementing artificial intelligence methodologies…” which reads on “generating a predictive model”.]
storing in a memory of a training server a predictive model allowing a neural network inference engine to infer one or more outputs based on inputs; Montalvo: Paragraph [0060] (“Referring to FIG. 3, in one exemplary embodiment, the DR server 14 may include a processor 100, a memory 102, a communications network interface device 108…”) Montalvo: Paragraph [0074] (“…the processor 100 may use, when implementing artificial Montalvo: Paragraph [0077] (“Also, the instructions 104 may include instructions that the processor 100 may execute to generate demand reduction action signals to be transmitted by the communications device 108. These demand reduction action signals may be transmitted to (i) a DR client 16 to provide for control operation of appliances at an end user by the DR client 16, and (ii) a SES device 18 to provide for control of generation and supply of supplemental electrical power to the end user from a supplemental energy source, during the course of a DR event.”) [The artificial intelligence methodologies stored in the memory of the ACPLHVAC controller read on “storing in a memory of a training server a predictive model”, and the processor reads on “a training server”.   The use of the neural networks reads on “a neural network training engine”.]
…
receiving by a processing unit of the training server at least one room characteristic; Montalvo: Paragraph [0060] [As described above.] Montalvo: Paragraph [0084] (“The DR client 16 may be configured similarly to the DR server 14, with a processor 130 and a memory 132 containing instructions 134 and data 136.”) Montalvo: Paragraph [0093] (“The monitor 200 is a device that may be connected to the appliances 170, and … electronic temperature and relative humidity sensors located within an interior and exterior to a facility of the end user. The monitor 200 may generate or collect data representative of environmental information available at the sensors, and also of the operating status of appliances, such as whether the appliance is ON or OFF... The monitor 200 transmits the generated or collected data to the DR client 16 in substantially real time, where the DR client 16 converts the received data into monitoring [The generated or collected data of the environmental information reads on “at least one room characteristic”.]
receiving by the processing unit at least one current environmental characteristic value and at least one set point from an environment controller via a communication interface of the training server; Montalvo: Paragraphs [0060], [0084], and [0093] [As described above.] Montalvo: Paragraph [0085] (“The user input 142, for example, may serve as an interface that permits a human (“operator”) at an end user to modify set points, such as environmental limits and appliance operation limits, or to opt out of participation in a DR event prior to and/or during a DR event.”) Montalvo: Paragraph [0151] (“…the processor 100 may evaluate monitoring data representative of temperature and/or humidity in the interior space of a facility of an end user impacted by demand rolling, so that the demand reduction action may provide that appliances with certain electric loads are reinstated (turned ON) and other appliances already closest to temperature and/or humidity set points agreed upon by the end user are turned OFF, where different spaces or facilities impacted by the demand rolling may have different set points.”) [The generated or collected data received at the processor 100, 130 reads on “receiving by the processing unit at least one current environmental characteristic value”.  The communications network interface device 108 reads on “via a communication interface”.  The set points, such as environmental limits and appliance operation limits reads on “at least one set point from an environmental controller”.]
…
transmitting by the processing unit the one or more commands for controlling the appliance to the environment controller via the communication interface; Montalvo: Paragraph [0132] (“The processor 100, in turn, in substantially real time, processes the feedback [The demand reduction action from the processor to turn off the appliance reads on “transmitting by the processing unit the one or more commands for controlling the appliance”.]
…
executing by the processing unit a neural network training engine to update the predictive model based on: inputs comprising the at least one current environmental characteristic value, the at least one set point, and the at least one room characteristic; Montalvo: Paragraphs [0074], [0077], and [0060] [As described above.] Montalvo: Paragraph [0122] (“In block 308, the processor 100 of the DR server 14 generates, and transmits over the network 20, a demand reduction action signal to implement the demand reduction action(s) determined at block 306.”) [  The processor 100, 130 reads on “executing by the processing unit” and the neural networks reads on “a neural network inference engine”. The monitoring data reads on “inputs”.  The space temperature reads on “the at least one current environmental characteristic value and the temperature setting reads on “the at least one set point”.  The generated or collected data reads on “the at least one room characteristic”.] 
one or more outputs consisting of the one or more commands; and… Montalvo: Paragraphs [0060], [0074], [0077], and [0122] [As described above.] [The transmission of the demand reduction action reads on “one or more outputs consisting of the one or more commands”.]
Montalvo does not expressly teach “storing in the memory a set of rules… determining by the processing unit one or more commands for controlling an appliance based on the at least one current environmental characteristic value, the at least one set point and the at least one room characteristic; …receiving by the processing unit at least one updated environmental characteristic value from the environment controller via the communication interface; determining by the processing unit a value of a reinforcement signal based on the at least one set point, the at least one updated environmental characteristic value and the set of rules, the value of the reinforcement signal being one of positive reinforcement or negative reinforcement; and… executing by the processing unit a neural network training engine to update the predictive model based on: …the value of the reinforcement signal”.  However, Barrett describes an automated control and parallel learning hvac apparatuses, methods and systems (“ACPLHVAC”) updates real time value function estimates through parallel and reinforcement learning, via ACPLHVAC components, by observing a defined state action space to maximize user Quality of Experience (QoE) and minimize associated energy required with regulating environmental spaces. Barrett describes:
storing in the memory a set of rules; Barrett: Paragraph [0019] (“Depending on the implementation, the selection of data used to set a base policy can be generalized or it can be specific to one or more details of the thermostat, such as geospacial location, geographic features (e.g., based on the location, what is the tree coverage, for example, as determined by satellite imagery), associated HVAC device(s) (e.g., type, capacity, model, etc. of the heating and/or cooling systems/sensors), user demographics, home/commercial implementation, and/or the like.”) Barrett: Paragraph [0124] (“…the ACPLHVAC controller and/or a computer employ various forms of memory 529.”) [The base policy stored reads on “a set of rules”.]
…
determining by the processing unit one or more commands for controlling an appliance based on the at least one current environmental characteristic value, the at least one set point and the at least one room characteristic; Barrett: Paragraph [0100] (“At the end of each epoch the ACPLHVAC observes the current state of the environment and chooses whether or not to execute an automated HVAC action (turn on or off). Table 1 below illustrates the typical information available, with readings for current temperature, humidity, set points and whether or not the room is booked at that particular time.”) [Whether or not to execute the automated HVAC action reads on “determining…one or more commands”.  The current temperature and humidity reads on “at least one current environmental characteristic value” and whether or not the room is booked reads on “at least one room characteristic”.]  
…
receiving by the processing unit at least one updated environmental characteristic value from the environment controller via the communication interface; Barrett: Paragraph [0038] (“It observes the value of the next state s′ which is generated executing the action 410.  % Observe s′ the next state currentTemp=getTempReading( ) % Returns reading from thermostat sensor…”) Barrett: Paragraph [0087] (“As far as the this implementation of the ACPLHVAC is concerned, the following parameters may be observed when choosing an action…”) Barrett: Paragraph [0088] (“rt: is the room temperature, (source: temperature sensors, unit: degrees Celsius)”) Barrett: Paragraph [0106] (“FIG. 5 shows a block diagram illustrating embodiments of a ACPLHVAC controller.”) Barrett: Paragraph [0112] (“The CPU interacts with memory [The return reading of the room temperature from the thermostat sensor at the ACPLHVAC in a next state reads on “receiving by the processing unit at least one updated environmental characteristic value”. The communication of the ACPLHVAC controller beyond through various interfaces with the thermostat sensor reads on “from the environment controller via the communication interface”.]
determining by the processing unit a value of a reinforcement signal based on the at least one set point, the at least one updated environmental characteristic value and the set of rules, the value of the reinforcement signal being one of positive reinforcement or negative reinforcement; and Barrett: Paragraph [0032] (“An ACPLHVAC controller may take any of these actions accorded to it in its action space at the end of any defined discrete time interval (“epoch”). Determining which action to choose at the end of each epoch is the fundamental operating procedure of the unit.”) Barrett: Paragraph [0036] (“FIG. 4B illustrates one embodiment of a reinforced learning component for an isolated (non-networked) unit… The unit begins in a state s that may include such things as the current time, the current temperature, a setpoint temperature, a setpoint humidity, a time to the setpoint, if the heat is on, if the heat is off, the MOS of occupants, and/or the like…”) Barrett: Paragraph [0037] (“The unit can choose an action a (in this embodiment the action space is defined as {turn heat on, turn heat off}) to execute using a policy π 406 and executes that action 408.”) Barrett: Paragraph [0038] (“It observes the value of the next state s′ which is generated executing the action 410.  % Observe s′ the next state currentTemp=getTempReading( ) % Returns reading from thermostat sensor…”) Barrett: Paragraph [0045] (“Transitioning to s′ results in a reward (which may be positive or negative) 414.”) [The ACPLHVAC controller reads on “the processing unit” and the reinforced learning component of the ACPLHVAC unit based on the setpoint temperature, current temperature of the next state, and the policy reads on “based on the at least one set point, the at least one updated environmental characteristic value and the set of rules”.  The reward being positive or negative reads on “one of positive reinforcement or negative reinforcement”.]
…
executing by the processing unit a neural network training engine to update the predictive model based on: …the value of the reinforcement signal. Barrett: Paragraph [0047] (“The value of performing the given action is calculated from the reward, the expectation of what reward should be received, and a weighting of future rewards 416.”) Barrett: Paragraph [0101] (“Decisions are made based on these learned value function representations. A number of embodiments of reinforcement learning methods could be used for control, including, but not limited to Bayesian RL, SARSA, TD(X), TD(o), Monte Carlo methods, model based approximation techniques combined with methods from dynamic programming and the corresponding action sampling methods such as ε-greedy, unbiased Bayesian sampling, myopic sampling, softmax selection (Gibbs, Boltzmann), greedy, interval estimation, among others.”) Barrett: Paragraph [0102] (“The ACPLHVAC learning controller can attempt to optimise the heating and cooling of the room as a multi-criteria problem.”) [The optimization of the reinforcement learning methods reads on “update the predictive model” based on the reward reads on “the value of the reinforcement signal”.]
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Montalvo and Barrett 
One of ordinary skill in the art at the time of the invention would have been motivated to do this modification because it would improve efficiency of learning and provides an overall reduction of processing required to provide comfortable and energy-efficient environments to users. Barrett Paragraph [0019]  The combination of the cited references would allow for the ability of the ACPLHVAC to balance state space variable adjustment (such as temperature, humidity, and/or the like) with associated energy costs (such as fuel, equipment wear/tear, and/or the like). Barrett Paragraph [0029]
Regarding claim 2, this claim incorporates the rejection presented in claim 1.  Montalvo further teaches:
The method of claim 1, wherein the predictive model comprises weights used by the neural network inference engine, and … Montalvo: Paragraph [0130] (“In a further embodiment, the determination of a demand reduction action from a hierarchy is performed using artificial intelligence supplemented by neural networks. For example, the neural networks may apply weightings to demand reduction actions in view of the monitoring data indicating interior space temperature and available daylight, to provide that the determined demand reduction action may implement an energy curtailment objective with minimal undesired impact on the end user.”)
Montalvo does not expressly teach “updating the predictive model comprises updating the weights.”  However, Barrett teaches:
updating the predictive model comprises updating the weights. Barrett: Paragraphs [0047], [0101], and [0102] [As described in claim 1.] Barrett: Paragraph [0029] (“In some instances, the ACPLHVAC may be adjusted by the user or an administrator to give different goals more weight through its reinforced learning reward system. In some instances, user comfort level may be weighted more heavily than energy cost. In some instances, energy costs may be weighted more heavily. In some instances, other factors such as season, environment, demographics of occupancy, and/or the like may be weighted by the system. In some instances, all these weights may be adjustable both prior to and during ACPLHVAC operation.”) [The optimization of the reinforcement learning methods reads on “update the predictive model” and the adjustable weight reads on “updating the weights”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 3, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the determination of the value of the reinforcement signal further takes into consideration the at least one room characteristic, the at least one current environmental characteristic value, or a combination thereof.”  However, Barrett teaches:
The method of claim 1, wherein the determination of the value of the reinforcement signal further takes into consideration the at least one room characteristic, the at least one current environmental characteristic value, or a combination thereof. Barrett: Paragraph [0036] [As described in claim 1.] [The reinforced learning component considering if the heat is on, if the heat is off, the MOS of occupants, and/or the like reads on “the at least one room characteristic”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 4, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one room characteristic is received from the environment controller via the communication interface of the training server.”  However, Barrett teaches:
The method of claim 1, wherein the at least one room characteristic is received from the environment controller via the communication interface of the training server. Barrett: Paragraph [0021] (“In one instance of the ACPLHVAC, demographic information of occupants may be used to determine shared optimizations across spaces, such as age compatibility. In some instances of the ACPLHVAC, units may be pre-loaded with state space specifics, such as information about the size and shape of the space, buildings of similar specifications, occupancy background and size, historical data and/or the like. In some instances of the ACPLHVAC, units may contain no pre-loaded information.”) [As shown in FIG. 5, the interface bus 507 interfaces between user input devices and/or peripheral devices and the ACPLHVAC controller.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 5, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one room characteristic comprises at least one of the following: a room type identifier selected among a plurality of room type identifiers, one or more geometric characteristics of the room, and a human activity in the room.”  However, Barrett teaches:
The method of claim 1, wherein the at least one room characteristic comprises at least one of the following: a room type identifier selected among a plurality of room type identifiers, one or more geometric characteristics of the room, and a human activity in the room. Barrett: Paragraph [0021] (“In one instance of the ACPLHVAC, demographic information of occupants may be used to determine shared optimizations across spaces, such as age compatibility. In some instances of the ACPLHVAC, units may be pre-loaded with state space specifics, such as information about the size and shape of the space, buildings of similar specifications, occupancy background and size, historical data and/or the like. In some instances of the ACPLHVAC, units may contain no pre-loaded information.”) 
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 6, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one room characteristic comprises a human activity in the room, the human activity in the room comprising at least one of the following: periods of time when the room is occupied by humans, and a type of activity performed by humans occupying the room.”  However, Barrett teaches:
The method of claim 1, wherein the at least one room characteristic comprises a human activity in the room, Barrett: Paragraph [0028] (“This also allows users to move through a number of comfort settings over the course of a day, to reflect how they are feeling in response to activities or stimuli (for example, a user may return to the environment after exercising and prefer cooler conditions. Later in the day, as their internal temperature returns to normal, they desire to warm the environment again). In one embodiment, the ACPLHVAC may  the human activity in the room comprising at least one of the following: 
periods of time when the room is occupied by humans, and Barrett: Paragraph [0036] (“…The unit begins in a state s that may include such things as the current time, … the MOS of occupants, and/or the like.”) Barrett: Paragraph [0098] (“…optimise the number of air changes per hour based on the occupancy of the room…”)
a type of activity performed by humans occupying the room. Barrett: Paragraph [0028] [As described above.] [Exercising reads on “a type of activity”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 7, this claim incorporates the rejection presented in claim 1. Montalvo further teaches: 
The method of claim 1, wherein the at least one current environmental characteristic value comprises at least one of the following: a current temperature, a current humidity level, a current carbon dioxide (CO2) level, and a current room occupancy. Montalvo: Paragraph [0093] (“The monitor 200 is a device that may be connected to the appliances 170, and conventional electronic environmental sensors, such as electronic temperature and relative humidity sensors located within an interior and exterior to a facility of the end user. The monitor 200 may generate or collect data representative of environmental information available at the sensors.”) [The collected data from the temperature and humidity sensors reads on “a current temperature, a current humidity level”.]
Regarding claim 8, this claim incorporates the rejection presented in claim 1.  Montalvo does not expressly teach “the at least one updated environmental characteristic value comprises Barrett teaches:
The method of claim 1, wherein the at least one updated environmental characteristic value comprises at least one of the following: an updated temperature, an updated humidity level, and an updated carbon dioxide (CO2) level. Barrett: Paragraphs [0038] and [0087] [As described in claim 1.] [The return reading of the room temperature from the thermostat sensor at the ACPLHVAC in a next state reads on “an updated temperature”.]
The motivation to combine Montalvo and Barrett, which teach the features of the present claim, as submitted in independent claim 1, is incorporated herein.
Regarding claim 9, this claim incorporates the rejection presented in claim 1.  Montalvo further teaches:
The method of claim 1, wherein the at least one set point comprises at least one of the following: a target temperature, a target humidity level, and a target CO2 level. Montalvo: Paragraph [0121] (“…the temperature setpoint of an A/C system to a lower temperature during morning hours to pre-cool interior space of the facility before the afternoon…”)
Regarding claim 10, this claim incorporates the rejection presented in claim 1.  Montalvo further teaches:
The method of claim 1, wherein the one or more commands for controlling the appliance include at least one of the following: 
a command for controlling a speed of a fan, a command for controlling a pressure generated by a compressor, and Montalvo: Paragraph [0119] (“…the demand reduction action may be to reset an existing speed setting of a fan, blower and/or pump of an appliance 170…”)
a command for controlling a rate of an airflow through a valve.
Claims 11-19 recite a training server that is implementing the functions of the method of claims 1, 3, 2, and 5-10 with substantially the same limitations, respectively.  Therefore, the rejections applied to claims 1, 3, 2, and 5-10 above also apply to claims 11-19.
Regarding independent claim 20, Montalvo teaches:
A non-transitory computer program product comprising instructions executable by a processing unit of a training server, the execution of the instructions by the processing unit providing for generating a predictive model for controlling an appliance by: Montalvo: Paragraph [0060] (“Referring to FIG. 3, in one exemplary embodiment, the DR server 14 may include a processor 100, a memory 102, a communications network interface device 108 and other components typically present in a general purpose computer.”) Montalvo: Paragraph [0061] The memory 102 stores information accessible by the processor 100, including instructions 104 that may be executed by the processor 100.”) Montalvo: Paragraph [0062] (“The processor 100 may be any well-known processor, such as processors from Intel Corporation or AMD.”) Montalvo: Paragraph [0063] (“The instructions 104 may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor 100. In that regard, the terms “instructions,” “steps” and “programs” may be used interchangeably herein. The instructions may be stored in object code format for direct processing by the processor, or in any other computer language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.”)
The remaining recitations of independent claim 20 are implementing the functions of the method of independent claim 1 with substantially the same limitations.  Therefore, the rejection applied to independent claim 1 above also applies to independent claim 20.
It is noted that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. See MPEP 2123. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US Patent Publication No. 2017/0364829 A1 to Fyffe describes reinforcement learning methods computing a value function expressing the expected longer term reward of a state, and may also compute a predictive model of the environment in terms of sequences of states, actions, and rewards. Reinforcement learning agents are dependent on the crafting of an appropriate reward function towards some task or goal, for example scoring points in a game, and are further dependent on hand-tuned parameters controlling learning rates, future reward discount factors, and exploit-vs-explore trade-offs.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALICIA M. CHOI whose telephone number is (571)272-1473.  The examiner can normally be reached on Monday - Friday 7:30 am to 5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocio Del Mar Perez-Velez can be reached on 571-270-5935.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/A.M.C./
Patent Examiner
Art Unit 2117

/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2117