DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is a first on the merit of the instant application.
Claims 1-10 are pending in this action.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over Brian et al (Brian) (WO 2017/201662 A1) in view of Powel et al. (Powell) (US 2015/0242946 A1).
As per claim 1: Brian discloses a reinforcement learning method executed by a computer, the reinforcement learning method comprising:
Calculating (evaluate), in reinforcement learning of repeatedly executing a learning step according to a state or an action of a control target a contribution level of the state or the action of the control target used in the learning step, the contribution level of the state or the action to the reinforcement learning being calculated for each learning step and calculated using a basis function used for representing the value function (see at least, pages 1-5, 12);  
determining whether to update the value function, based on the value function after each learning step and the calculated contribution level calculated in each learning step (see abstract); and
updating the value function when the determining determines to update the value function (see abstract). However, Hein does not explicitly teach about --- a value function that has monotonicity as a characteristic of a value. However, in the same field of endeavor, Powell teaches about ---  a new process that combines lookup tables while exploiting the monotonicity of the value function, which captures the behavior that the value becomes larger as each variable in the state variable becomes larger (see abstract; par. 0030). Therefore, it would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teaching of Hein with that of Powell for the advantage of --- capturing the behavior that the value becomes larger as each variable in the state variable becomes larger (see par. 0030). 
As per claim 2: Brian teaches about reinforcement learning method according to claim 1, further comprising updating an experience level function that defines, by the basis function, an experience level in the reinforcement learning for each state or action of the control target, based on the calculated contribution level calculated in each learning step (see page 1), wherein
the determining whether to update the value function is determined based on the value function after the learning step and the updated experience level function (see abstract; pages 8-9).
As per claim 3: Brian teaches about reinforcement learning method according to claim 2, wherein when the value function is to be updated, the updating the experience level function includes further updating the experience level function such that the experience level of the state or the action of the control target used in the learning step is increased in the reinforcement learning (see abstract; pages 4, 7-9).
As per claim 4: Brian teaches about reinforcement learning method according to claim 2, wherein the updating the value function includes updating the value function such that the value of the state or the action of the control target used in the learning step approaches a value of a second state or a second action of the control target, the second state or the second action having a second experience level that is greater than the experience level of the state or the action of the control target used in the learning step (see abstract; pages 1-3).
As per claim 5: Brian teaches about reinforcement learning method according to claim 2, wherein the updating the value function includes updating the value function such that a value of a second state or a second action of the control target and having a second experience level that is smaller than the experience level of the state or the action of the control target used in the learning step approaches the value of the state or the action of the control target used in the learning step (see page 1, 7, 9-11, 14).
As per claim 6: Brian teaches about reinforcement learning method according to claim 2, wherein the monotonicity is monomodality, and the determining whether to update the value function includes determining to update the value function when the state or the action of the control target used in the learning step is interposed between two states or actions of the control target, the two states or actions having a second experience level that is greater than the experience level of the state or the action of the control target used in the learning step(see pages 1-6, 18-19).
As per claim 7: Brian teaches about reinforcement learning method according to claim 1, wherein the determining whether to update the value function includes again determining whether to update the value function after the learning step is executed a predetermined number of times after the determining determines not to update the value function (see pages 1-6, 11-14, 18-19).
As per claim 8: Brian teaches about reinforcement learning method according to claim 1, wherein the determining whether to update the value function is determined based on the value function after a previous learning step and the calculated contribution level before a learning result of a current learning step is reflected to the value function (see pages (see pages 3-15, 19), and 
updating the value function includes reflecting the learning result of the current learning step to the value function and updating the value function when the determining determines to update the value function and includes reflecting the learning result of the current learning step to the value function when the determining determines not to update the value function (see 1-3).
As per claim 9: the features of claim 9 are similar to the features of claim 1, except claim 9 is directed to a non-transitory, computer-readable recording medium storing therein a reinforcement learning program that causes a computer to execute a process, which is obvious within the combined prior art as indicated in the detailed rejection of claim 1. Therefore, claim 9 has been rejected on the same ground and motivation as claim 1.
As per claim 10: the features of claim 10 are similar to the features of claim 1, except claim 10 is directed to an apparatus comprising: a memory; and a processor coupled to the memory, wherein such an apparatus is considered to be obvious within the modified applied prior art. Therefore, claim 10 has been rejected on the same ground and motivation as claim 1.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MELESS NMN ZEWDU whose telephone number is (571)272-7873. The examiner can normally be reached M-F 8:30 am-4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jinsong Hu can be reached on (571) 272-3965. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MELESS N ZEWDU/Primary Examiner, Art Unit 2643                                                                                                                                                                                                        6/15/2022