Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The action is in response to the Applicant’s communication filed on 02/21/2020.
Claims 1-11 are pending, where claims 1,6 and 11 are independent.

Information Disclosure Statement 
The information disclosure statement (IDS) submitted on 02/21/2020 has been filed on the filing date of the application.  The submission is in-compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Foreign Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. §119 (a)-(d). The non-English certified copy has been filed in parent Application No. JP 2019-039031, filed on 03/04/2019. 
	Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action, see 37 CFR 41.154(b) and 41.202(e). 

However, the English translation of the foreign application is required for priority claim in addition to abstract. See MPEP 213.04, 2304.01(c), 37 CFR § 1.55(g)(3-4).

Multiple filed related applications 
Applicants have filed multiple related applications.  To date, it appears that the related applications (e.g. Application No. 16/797353, 16/797515, 16/744948, 16/733880, 16/130482, 16/709144, 16/293724, 16/702676, 16/989899, 17/001706) stand pending and yet to be examined. These are plurality of co-pending related Applications and double patenting issue is proper. See MPEP 804 and 1490 (VI) D:   

Nonstatutory Double Patenting 
37 CFR 1.78(b) provides that when two or more applications filed by the same applicant contain conflicting claims, elimination of such claims from all but one application may be required in the absence of good and sufficient reason for their retention during pendency in more than one application.  Applicant is required to either cancel the conflicting claims from all but one application or maintain a clear line of demarcation between the applications.  See MPEP § 822.
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).

The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  See MPEP § 804.

Claims 1, 6 and 11 are provisionally rejected on the ground of nonstatutory double patenting over claims 1 and 13-14 of copending U.S. Patent application No. 16/797515 (corresponding USPGPub. No. 2020/0285208 A1).  This is a provisional double patenting rejection because the patentably indistinct claims have not in fact been patented.
The subject matter claimed in the instant application is fully disclosed in the referenced copending application and would be covered by any patent granted on that copending application since the referenced copending US Patent application No. 16/797515 (corresponding USPGPub. No. 2020/0285208 A1) and the instant applications are claiming common subject matter, as follows: 

Instant Application No. 16/797353
US Application 16/797515 (corresponding USPGPub. No. 2020/0285208 A1)
Title 
REINFORCEMENT LEARNING METHOD AND REINFORCEMENT LEARNING SYSTEM
REINFORCEMENT LEARNING METHOD, RECORDING MEDIUM, AND REINFORCEMENT LEARNING SYSTEM
Claim 1. A computer-implemented reinforcement learning method comprising: 
determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and 
determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.


determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.



	Claims 1, 6 and 11are provisionally rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1 and 2 of co-pending application 16/797515 (corresponding USPGPub. No. 2020/0285208 A1). Although the conflicting claims are not identical, they are not patentably distinct from each other (as shown in the table for comparison) because they are substantially similar (as for example the limitation “determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition” of the application is equivalent to the limitation “determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk” of the co-pending 
It would be therefore obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made that to modify or to omit the additional elements of claims 1 and 13-14 of the co-pending application to arrive at the claims 1, 6 and 11 of the instant application, would perform the same functions as before. 
This is a provisional obviousness-type nonstatutory double patenting rejection because the patentably indistinct claims have not yet been patented. See MPEP § 804.

Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


Claims 1-11 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Cella, et al. (USPGPub No. 20190121350 A1) in view of Sugimoto, et al. (USPGPub No. 20120253514 A1). 
	As to claims 1, 6 and 11, Cella discloses A computer-implemented reinforcement learning method (Cella [0275-277] “environment 104 using machine learning to enable derivation-based learning outcomes - learn from and make decisions on a set of data, by making data-driven predictions and adapting according to the set of data - machine learning involve performing a plurality of machine learning tasks by machine learning systems - reinforcement learning - reinforcement learning include the machine learning systems performing in a dynamic environment and then providing feedback about correct and incorrect decisions” [abstract] “plurality of input sensors communicatively coupled to a controller, a data collection circuit - machine learning data analysis circuit structured to receive the output data and learn received output data patterns indicative of an outcome” see Fig. 1-160)comprising: 
determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition (Cella [0275-277] “environment 104 using machine learning to enable derivation-based learning outcomes - learn from and make decisions on a set of data, by making data-driven predictions and adapting according to the set of data - machine learning involve performing a plurality of machine learning tasks by machine learning systems - reinforcement learning - reinforcement learning include the machine learning systems performing in a dynamic environment and then providing feedback about correct and incorrect decisions” [abstract] “plurality of input sensors communicatively coupled to a controller, a data collection circuit - machine learning data analysis circuit structured to receive the output data and learn received output data patterns indicative of an outcome - being seeded with a model based on industry-specific feedback” see Fig. 1-160, dynamic environment decisions using prediction with plurality of feedback for adapting preset data provides the target constraint condition); and 
But, Cella does not explicitly teach determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.
However, Sugimoto discloses determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing (Sugimoto [claim 1] “control parameter value calculation unit that calculates a value of one or more control parameters maximizing a reward that is output by the reward function by substituting the value of the one or more first-type environment parameters into the reward function; a control parameter value output unit that outputs the value of the one or more control parameters to the control object” [0047-63] [abstract] see Fig. 1-11, calculating control parameter to the control object provides determine control input to control object).

Cella and Sugimoto are analogous arts from the same field of endeavor and contain overlapping structural and functional similarities and both contain reinforcement learning.  
Therefore at the time the invention was made, it would have been obvious to a person of ordinary skill in the art to modify the above functionalities determining a control input to the control object, as taught by Cella, and incorporating calculating control parameter to the control object, as taught by Sugimoto.  

As to claims 2 and 7, the combination of Cella and Sugimoto disclose all the limitations of the base claims as outlined above.  
The combination further discloses The reinforcement learning method according to claim 1, wherein the determining of the parameter includes determining the parameter of the reinforcement learner such that the specific probability is set to a probability that is higher than the target probability and calculated based on the specific time and the target probability (Cella [0275-277] “environment 104 using machine learning to enable derivation-based learning outcomes - learn from and make decisions on a set of data, by making data-driven predictions and adapting according to the set of data - machine learning involve performing a plurality of machine learning tasks by machine learning systems - reinforcement learning - reinforcement learning include the machine learning systems performing in a dynamic environment and then providing feedback about correct and incorrect decisions” [0040-59] “compare relative phases of the first and second sensor signals - continuously monitored alarm having a pre-determined trigger condition when the third input is unassigned to or undetected at any of the multiple outputs.” [abstract] “plurality of input sensors communicatively coupled to a controller, a data collection circuit - machine learning data analysis circuit structured to receive the output data and learn received output data patterns indicative of an outcome - being seeded with a model based on industry-specific feedback” see Fig. 1-160, dynamic environment decisions using prediction with plurality of feedback for adapting preset data provides the target constraint condition).

As to claims 3 and 8, the combination of Cella and Sugimoto disclose all the limitations of the base claims as outlined above.  
The combination further discloses The reinforcement learning method according to claim 1, wherein the reinforcement learner is configured to automatically adjust a search range of the control input so that the constraint condition is satisfied with the specific probability (Sugimoto [0047-63] “reinforcement learning apparatus 2 performs reinforcement learning learns an action rule that maximizes the reward, by searching a state space by trial and error - environment parameter obtaining unit 212 realized by a high-resolution stereo camera, range sensor, or the like” [claim 1] “control parameter value calculation unit that calculates a value of one or more control parameters maximizing a reward that is output by the reward function - environment parameters into the reward function; a control parameter value output unit that outputs the value of the one or more control parameters to the control object” [abstract] see Fig. 1-11, reinforcement learning system with range sensor provides automatic adjust the search range of control input).

As to claims 4 and 9, the combination of Cella and Sugimoto disclose all the limitations of the base claims as outlined above.  
The combination further discloses The reinforcement learning method according to claim 1, wherein 
the control object is a wind power generation facility, and the reinforcement learner uses a generator torque of the wind power generation facility as the control input, at least one of a power generation amount of the power generation facility, a rotation amount of a turbine of the power generation facility, a rotation speed of the turbine of the power generation facility, a wind direction for the power generation facility, and a wind speed for the power generation facility as the state, and the power generation amount of the power generation facility as a reward so as to perform reinforcement learning for learning a policy for controlling the control object (Cella [0271-281] “machines such as turbines, windmills, industrial vehicles, robots, and the like - data collection system 102 deployed in the environment 104 - monitor the motor, the rotary encoder, and the potentiometer of the servomechanism - progress of industrial processes - environment 104 using machine learning to enable derivation-based learning outcomes - machine learning performing a plurality of machine learning tasks by machine learning systems - reinforcement learning include the machine learning systems performing in a dynamic environment” [0040-59] [abstract] “plurality of input sensors communicatively coupled to a controller, a data collection circuit - machine learning data analysis circuit structured to receive the output data and learn received output data patterns indicative of an outcome - being seeded with a model based on industry-specific feedback” see Fig. 1-160, control object of turbines, windmills provides the wind power generation facility and its performance).

As to claims 5 and 10, the combination of Cella and Sugimoto disclose all the limitations of the base claims as outlined above.  
The combination further discloses The reinforcement learning method according to claim 1, wherein the specific time is defined by the number of steps of determining the control input, and the determining of the parameter includes determining the specific probability as a power root of the number of steps corresponding to the target probability (Cella [0271-281] “machines such as turbines, windmills, industrial vehicles, robots, and the like - data collection system 102 deployed in the environment 104 - monitor the motor, the rotary encoder, and the potentiometer of the servomechanism - progress of industrial processes - environment 104 using machine learning to enable derivation-based learning outcomes - machine learning performing a plurality of machine learning tasks by machine learning systems - reinforcement learning include the machine learning systems performing in a dynamic environment” [0040-59] [abstract] “plurality of input sensors communicatively coupled to a controller, a data collection circuit - machine learning data analysis circuit structured to receive the output data and learn received output data patterns indicative of an outcome - being seeded with a model based on industry-specific feedback” see Fig. 1-160, reinforcement learning performing plurality of machine learning tasks by machine learning systems in dynamic environment provides an optimization process under plurality of steps obviously includes power root).

Citation of Pertinent Prior Art
It is noted that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. See MPEP 2141.02 VI. PRIOR ART MUST BE CONSIDERED IN ITS ENTIRETY, i.e., as a whole and 2123.
Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record:
Sivasubramanian, et al. USP No. 8,429,097 B1.
Takuda, et al. USPGPub No. 20170255177 A1. 
Li, et al. USPGPub No. 2020/0266743 A1. 
Sekiai, et al. USPGPub No. 20090132095 A1.
Wen, et al. USPGPub No. 20210278825 A1.
Md Azad whose telephone number is (571)272-0553.  The examiner can normally be reached on Mon-Thu 9AM-5PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mohammad Ali can be reached on (571)272-4105.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/Md Azad/
Primary Examiner, Art Unit 2119