The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This communication is responsive to Amendment filed 04/13/2022.
Claims 1-14 have been examined.

Response to Amendment
In the instant amendment, claims 1, and 13-14 have been amended.
The 35 USC §101 rejection over claim 13 is withdrawn in view of Applicant’s amendments.

Information Disclosure Statement
As required by M.P.E.P. 609, the applicant’s submissions of the Information Disclosure Statement dated 04/29/2022 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.

Allowable Subject Matter
Claims 2-5, 7 and 9-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over US 2018/0230967 to Beatrice et al. (hereafter “Beatrice”) in view of 2017/0051681 to Arias et al. (hereafter “Arias”) in further view of US 2004/0181334 to Blumbergs et al. (hereafter “Blumbergs”)

As per claim 1, Beatrice discloses a reinforcement learning method executed by a computer, the reinforcement learning method comprising: 
calculating a degree of risk at a current time of violating a constraint condition related to a state of a controlled object (FIGs. 9 and 13; paragraphs 0042 and 0083: “For instance, in situations where the first wind turbine 55 (e.g., wind turbine 1) has a risk level of high (corresponding to a high cut-in speed), the dashboard engine 77 may have logic for reducing the risk level (and the corresponding cut-in speed) of the first wind turbine 55 if the overall mortality rate per wind turbine 55 of the turbine site 57 is low enough that thresholds (e.g., a “bat threshold”) for mortality of volant animals 59 have not been reached.” [Wingdings font/0xE0] risk degree is high when the threshold (i.e., bat threshold) is reached (violated)), the predicted value being obtained from model information defining a relationship between the state of the controlled object (FIGs. 9 and 13; paragraphs 0025-0026, 0028 and 0080: “The data server can also receive an assigned baseline risk rank for each of the wind turbines. In at least one example, the baseline rank can be tuned for a specific type and/or species of volant animal, such as bats. In such an example, the assignment of the baseline risk rank can be based on parameters (e.g., proximity of the one or more wind turbines to water, woods, etc.) associated with the one or more wind turbines and/or based on migration pattern data characterizing bat migrations at or near the turbine site. The data server programmed to assign an individual risk level (e.g., low, medium and high) to each of the wind turbines based on an assigned baseline risk rank and the mortality data.” [Wingdings font/0xE0] base risk levels) and a control input to the controlled object (FIGs. 9 and 13; paragraphs 0025-0026, 0028 and 0080: “The data server can also receive an assigned baseline risk rank for each of the wind turbines. In at least one example, the baseline rank can be tuned for a specific type and/or species of volant animal, such as bats. In such an example, the assignment of the baseline risk rank can be based on parameters (e.g., proximity of the one or more wind turbines to water, woods, etc.) associated with the one or more wind turbines and/or based on migration pattern data characterizing bat migrations at or near the turbine site. The data server programmed to assign an individual risk level (e.g., low, medium and high) to each of the wind turbines based on an assigned baseline risk rank and the mortality data.” [Wingdings font/0xE0] the mortality data as a control input).
Beatrice does not explicitly disclose the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.
Arias further discloses the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point (paragraph 0033: “For instance, the forecast model may use the history of the grid behavior to predict a likely future behavior. For instance, this may be used to modify the operation settings in a way that a certain rate of changes of grid frequencies can be fulfilled with the required power increase or decrease. For instance, if the system predicts with a certain likelihood that within a certain timeframe the grid frequency may drop by at maximum 1Hz at a maximum grid frequency gradient of 0.1Hz/s, this may be used as an input to the business model and the business model may commission the operating state or the operation settings such as to provide the required power reserve to compensate the maximum frequency drop at the forecast frequency drop gradient. On the other hand, the likelihood of not fulfilling the grid requirements with certain settings may be calculated. Dependent on the risk level taken in the business model, the operation settings may be chosen such that a certain risk of not fulfilling grid requirements is taken for the benefit of operating the power plant at a higher efficiency and revenue margin instead of providing the power reserve.”)
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Arias into Beatrice’s teaching because it would provide for the purpose of enabling an improved prediction of the power plant response to a variation of certain controlled operation parameters (Arias, paragraph 0008).
Blumbergs further disclose determining the control input to the controlled object at the current time point (paragraphs 0026, 0050-000054 and 0060-0061), from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases (paragraphs 0026, 0050-000054 and 0060-0061: “paragraphs 0026, 0050-000054 and 0060-0061: “According to the present invention, the navigation method and system allows dynamic access to different degrees of navigation functions when the vehicle is in motion based on the conditions surrounding the vehicle such as distances from other vehicles and obstacles detected by sensors provided on the vehicle. When the degree of risk involved is high, the access to the navigation functions is further limited, and when the degree of risk is low, a scope of the access to the navigation functions is increased.” [Wingdings font/0xE0] higher risk [Wingdings font/0xE0] less degrees of navigation).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Blumbergs into Beatrice’s teaching and Arias’ teaching because it would provide dynamically changing the scope of access to the navigation functions based on driving environment and conditions (Blumbergs, paragraph 0016).
As per claim 13, Beatrice discloses a non-transitory computer-readable storage medium storing therein a reinforcement learning program that causes a computer to execute a process, the process comprising:
calculating a degree of risk at a current time, of violating a constraint condition related to  a state of a controlled object (FIGs. 9 and 13; paragraphs 0042 and 0083: “For instance, in situations where the first wind turbine 55 (e.g., wind turbine 1) has a risk level of high (corresponding to a high cut-in speed), the dashboard engine 77 may have logic for reducing the risk level (and the corresponding cut-in speed) of the first wind turbine 55 if the overall mortality rate per wind turbine 55 of the turbine site 57 is low enough that thresholds (e.g., a “bat threshold”) for mortality of volant animals 59 have not been reached.” [Wingdings font/0xE0] risk degree is high when the threshold (i.e., bat threshold) is reached (violated)), the predicted value being obtained from model information defining a relationship between the state of the controlled object (FIGs. 9 and 13; paragraphs 0025-0026, 0028 and 0080: “The data server can also receive an assigned baseline risk rank for each of the wind turbines. In at least one example, the baseline rank can be tuned for a specific type and/or species of volant animal, such as bats. In such an example, the assignment of the baseline risk rank can be based on parameters (e.g., proximity of the one or more wind turbines to water, woods, etc.) associated with the one or more wind turbines and/or based on migration pattern data characterizing bat migrations at or near the turbine site. The data server programmed to assign an individual risk level (e.g., low, medium and high) to each of the wind turbines based on an assigned baseline risk rank and the mortality data.” [Wingdings font/0xE0] base risk levels) and a control input to the controlled object (FIGs. 9 and 13; paragraphs 0025-0026, 0028 and 0080: “The data server can also receive an assigned baseline risk rank for each of the wind turbines. In at least one example, the baseline rank can be tuned for a specific type and/or species of volant animal, such as bats. In such an example, the assignment of the baseline risk rank can be based on parameters (e.g., proximity of the one or more wind turbines to water, woods, etc.) associated with the one or more wind turbines and/or based on migration pattern data characterizing bat migrations at or near the turbine site. The data server programmed to assign an individual risk level (e.g., low, medium and high) to each of the wind turbines based on an assigned baseline risk rank and the mortality data.” [Wingdings font/0xE0] the mortality data as a control input).
Beatrice does not explicitly disclose the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point; and determining the control input to the controlled abject at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.
Arias further discloses the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point (paragraph 0033: “For instance, the forecast model may use the history of the grid behavior to predict a likely future behavior. For instance, this may be used to modify the operation settings in a way that a certain rate of changes of grid frequencies can be fulfilled with the required power increase or decrease. For instance, if the system predicts with a certain likelihood that within a certain timeframe the grid frequency may drop by at maximum 1Hz at a maximum grid frequency gradient of 0.1Hz/s, this may be used as an input to the business model and the business model may commission the operating state or the operation settings such as to provide the required power reserve to compensate the maximum frequency drop at the forecast frequency drop gradient. On the other hand, the likelihood of not fulfilling the grid requirements with certain settings may be calculated. Dependent on the risk level taken in the business model, the operation settings may be chosen such that a certain risk of not fulfilling grid requirements is taken for the benefit of operating the power plant at a higher efficiency and revenue margin instead of providing the power reserve.”)
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Arias into Beatrice’s teaching because it would provide for the purpose of enabling an improved prediction of the power plant response to a variation of certain controlled operation parameters (Arias, paragraph 0008).
Blumbergs further disclose determining the control input to the controlled object at the current time point (paragraphs 0026, 0050-000054 and 0060-0061), from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases (paragraphs 0026, 0050-000054 and 0060-0061: “paragraphs 0026, 0050-000054 and 0060-0061: “According to the present invention, the navigation method and system allows dynamic access to different degrees of navigation functions when the vehicle is in motion based on the conditions surrounding the vehicle such as distances from other vehicles and obstacles detected by sensors provided on the vehicle. When the degree of risk involved is high, the access to the navigation functions is further limited, and when the degree of risk is low, a scope of the access to the navigation functions is increased.” [Wingdings font/0xE0] higher risk [Wingdings font/0xE0] less degrees of navigation).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Blumbergs into Beatrice’s teaching and Arias’ teaching because it would provide dynamically changing the scope of access to the navigation functions based on driving environment and conditions (Blumbergs, paragraph 0016).

As per claim 14, Beatrice discloses a reinforcement learning system comprising:
a memory (FIG. 7); and
a processor coupled to the memory (FIG. 7), the processor configured to:
calculate a degree of risk at a current time, of violating a constraint condition related to a state of a controlled object (FIGs. 9 and 13; paragraphs 0042 and 0083: “For instance, in situations where the first wind turbine 55 (e.g., wind turbine 1) has a risk level of high (corresponding to a high cut-in speed), the dashboard engine 77 may have logic for reducing the risk level (and the corresponding cut-in speed) of the first wind turbine 55 if the overall mortality rate per wind turbine 55 of the turbine site 57 is low enough that thresholds (e.g., a “bat threshold”) for mortality of volant animals 59 have not been reached.” [Wingdings font/0xE0] risk degree is high when the threshold (i.e., bat threshold) is reached (violated)), the predicted value being obtained from model information defining a relationship between the state of the controlled object (FIGs. 9 and 13; paragraphs 0025-0026, 0028 and 0080: “The data server can also receive an assigned baseline risk rank for each of the wind turbines. In at least one example, the baseline rank can be tuned for a specific type and/or species of volant animal, such as bats. In such an example, the assignment of the baseline risk rank can be based on parameters (e.g., proximity of the one or more wind turbines to water, woods, etc.) associated with the one or more wind turbines and/or based on migration pattern data characterizing bat migrations at or near the turbine site. The data server programmed to assign an individual risk level (e.g., low, medium and high) to each of the wind turbines based on an assigned baseline risk rank and the mortality data.” [Wingdings font/0xE0] base risk levels) and a control input to the controlled object (FIGs. 9 and 13; paragraphs 0025-0026, 0028 and 0080: “The data server can also receive an assigned baseline risk rank for each of the wind turbines. In at least one example, the baseline rank can be tuned for a specific type and/or species of volant animal, such as bats. In such an example, the assignment of the baseline risk rank can be based on parameters (e.g., proximity of the one or more wind turbines to water, woods, etc.) associated with the one or more wind turbines and/or based on migration pattern data characterizing bat migrations at or near the turbine site. The data server programmed to assign an individual risk level (e.g., low, medium and high) to each of the wind turbines based on an assigned baseline risk rank and the mortality data.” [Wingdings font/0xE0] the mortality data as a control input).
Beatrice does not explicitly disclose the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point; determine the control input to the controlled object at the current time paint, from a range defined according fa the calculated degree of risk so that the range becomes narrower as the calculated decree of risk increases.
Arias further discloses the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point (paragraph 0033: “For instance, the forecast model may use the history of the grid behavior to predict a likely future behavior. For instance, this may be used to modify the operation settings in a way that a certain rate of changes of grid frequencies can be fulfilled with the required power increase or decrease. For instance, if the system predicts with a certain likelihood that within a certain timeframe the grid frequency may drop by at maximum 1Hz at a maximum grid frequency gradient of 0.1Hz/s, this may be used as an input to the business model and the business model may commission the operating state or the operation settings such as to provide the required power reserve to compensate the maximum frequency drop at the forecast frequency drop gradient. On the other hand, the likelihood of not fulfilling the grid requirements with certain settings may be calculated. Dependent on the risk level taken in the business model, the operation settings may be chosen such that a certain risk of not fulfilling grid requirements is taken for the benefit of operating the power plant at a higher efficiency and revenue margin instead of providing the power reserve.”)
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Arias into Beatrice’s teaching because it would provide for the purpose of enabling an improved prediction of the power plant response to a variation of certain controlled operation parameters (Arias, paragraph 0008).
Blumbergs further disclose determine the control input to the controlled object at the current time point (paragraphs 0026, 0050-000054 and 0060-0061), from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases (paragraphs 0026, 0050-000054 and 0060-0061: “paragraphs 0026, 0050-000054 and 0060-0061: “According to the present invention, the navigation method and system allows dynamic access to different degrees of navigation functions when the vehicle is in motion based on the conditions surrounding the vehicle such as distances from other vehicles and obstacles detected by sensors provided on the vehicle. When the degree of risk involved is high, the access to the navigation functions is further limited, and when the degree of risk is low, a scope of the access to the navigation functions is increased.” [Wingdings font/0xE0] higher risk [Wingdings font/0xE0] less degrees of navigation).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Blumbergs into Beatrice’s teaching and Arias’ teaching because it would provide dynamically changing the scope of access to the navigation functions based on driving environment and conditions (Blumbergs, paragraph 0016).


Claims 6 are rejected under 35 U.S.C. 103 as being unpatentable over Beatrice in view of Arias, and Blumbergs, as applied to claim 1, and in further view of US 2017/0061796 to Osagawa.

As per claim 6, Beatrice does not explicitly disclose the calculating and the determining are executed in an episode-type reinforcement learning in which a unit is defined as a period from initialization of the state of the controlled object until the state of the controlled object no longer satisfies the constraint condition, or a period from initialization of the state of the controlled object until a certain time elapses.
Osagawa further discloses the calculating and the determining are executed in an episode-type reinforcement learning in which a unit is defined as a period from initialization of the state of the controlled object until the state of the controlled object no longer satisfies the constraint condition, or a period from initialization of the state of the controlled object until a certain time elapses (FIG. 13; paragraphs 0136 and 0156-0157).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Osagawa into Beatrice’s teaching, Arias’ teaching and Blumbergs’s teaching because it would provide a purpose of preventing unwanted alarms from being issued and avoiding a collision between two objects (Osagawa, paragraph 0008).
	
Claims 8 are rejected under 35 U.S.C. 103 as being unpatentable over Beatrice in view of Arias, and Blumbergs, as applied to claim 1, and in further view of US 2008/0120335 to Dolgoff.

As per claim 8, Beatrice does not explicitly disclose the model information uses a variable indicative of the state of the controlled object at any time point and a variable indicative of the control input to the controlled object at the anytime point, to represent linear approximation of a function for the state of the controlled object at a time point subsequent to the anytime point. 
Dolgoff further discloses the model information uses a variable indicative of the state of the controlled object at any time point and a variable indicative of the control input to the controlled object at the anytime point (FIG. 6; paragraph 0077: “the user can hold down the right mouse button to draw on either the graph in the main frame window 602, or the navigator window 650, and specify graphically what they would like the maintain value (set-point) to be at a given time.”), to represent linear approximation of a function for the state of the controlled object at a time point subsequent to the anytime point (FIG. 6; paragraph 0077: “Every pixel in the sensor information area 610 represents 15 seconds, and a linear approximation is used if the user draws on the navigator window 650. The user can access the line tool by pressing the line tool button, which shows linear maintain value settings, or by selection from a pull-down menu. By holding down the right mouse button during use of the line tool, the user can specify the beginning and end points of a line that will represent a ramp in maintain value settings for the selected sensor. The user can access the zoom tool by pressing the button depicting a magnifying glass: by positioning the cursor over a point of interest, and holding down the right mouse button, and moving the mouse up, the graph will zoom in, by moving the mouse down, the graph will zoom out. In a preferred embodiment, the zoom behavior affects only the Y-axis of the graph.”). 
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Dolgoff into Beatrice’s teaching, Arias’ teaching and Blumbergs’s teaching because it would provide a purpose of a novel graphical user interface that enhances and simplifies comprehension of readings, device states, variables, text notes, pictures, other values, and how they relate to each other, as well as settings and the current, past, and future status of an environment (Dolgoff, paragraph 0006).

Claims 11 are rejected under 35 U.S.C. 103 as being unpatentable over Beatrice in view of Arias, and Blumbergs, as applied to claim 1, and in further view of US 2016/0041074 to Pliskin.

As per claim 11, Beatrice does not explicitly disclose wherein the controlled object is an air conditioning facility.
Pliskin further disclose wherein the controlled object is an air conditioning facility (paragraphs 0015-0016).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Pliskin into Beatrice’s teaching, Arias’ teaching and Blumbergs’s teaching because it would provide a purpose of determining concentration levels of particulates in the air, and, based upon the levels of particulate determined, the unit can automatically change the speed of the blowers to either increase or decrease the volume of air drawn through the unit, or turn off the blowers altogether (Pliskin, paragraph 0015).

Claims 12 are rejected under 35 U.S.C. 103 as being unpatentable over Beatrice in view of Arias, and Blumbergs, as applied to claim 1, and in further view of US 2011/0184555 to Kosuge et al. (hereafter “Kosuge”).

As per claim 12, Beatrice does not explicitly disclose wherein the controlled object is an industrial robot.
Kosuge further discloses wherein the controlled object is an industrial robot (paragraphs 0085 and 0098).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Kosuge into Beatrice’s teaching, Arias’ teaching and Blumbergs’s teaching because it would provide a purpose of the work progress estimation unit estimates the work progress status based on data input from the measuring unit while referring to the data on the procedure, and the work progress estimation unit selects objects necessary for the next task when the work is found to have advanced to the next procedure  (Kosuge, paragraph 0021).

Response to Arguments
Applicants’ arguments have been considered but are moot in view of the new ground(s) of rejection.  Applicants’ amendment necessitated the new ground(s) of rejection presented in this Office action. 

Conclusion
Applicants’ amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication should be directed to examiner Tuan Dao, whose telephone/fax numbers are (571) 270 3387 and (571) 270 4387, respectively. The examiner can normally be reached on every Monday-Thursday and the second Friday of the bi-week from 7:30AM to 5:00PM.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Chat Do, can be reached at (571) 272 3721.
The fax phone number for the organization where this application or proceeding is assigned is (571) 273 8300. 
Any inquiry of a general nature of relating to the status of this application or proceeding should be directed to the TC 2100 Group receptionist whose telephone number is (571) 272 2100.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/TUAN C DAO/Primary Examiner, Art Unit 2193