Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments and/or arguments filed on 12/29/2020. As per applicants request, claims 1, 3-4, 6, 9-11, 13-14, 16 and 19-20 have been amended. No new claims have been added. Claims 2 and 12 have been cancelled. Claims 1, 3-11, and 13-20 remain pending.

In response to applicant’s amendments and/or arguments filed on 12/29/2020, the 35 U.S.C. 103 rejections made for claims 1-20 in the previous office action have been withdrawn.

In response to applicant’s amendments and/or arguments filed on 12/29/2020, the 35 U.S.C. 112(b) rejections made for claims 2, 10, 12, and 19 in the previous office action have been withdrawn.


Reason for Allowance
The following is an examiners statement of reasons for allowance: Claims 1, 3, 9, 11, 13, and 19 are considered to be allowable as none of the references of record, either alone or in combination, fairly disclose or suggest the combination of limitations specified in the independent claims, including at least:

Regarding Claims 1, and 11,
	
    PNG
    media_image1.png
    111
    778
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    306
    795
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    559
    819
    media_image3.png
    Greyscale


	The closest prior art of record includes Hu which discloses a reinforcement learning algorithm with an actor-critic configuration and adjusted learning of rewards and losses. However, Hu is silent with regard to the recited reward loss formula using both specific personalized rewards and common rewards.
	In addition, Nguyen discloses a reinforcement learning system that includes a reward function in order to determine rewards for specific action. However, Nguyen is also silent with regard to the recited reward loss formula using both specific personalized rewards and common rewards.
	Furthermore, Mnih is a system that performs backpropagation by referring to a reward loss. However, Mnih is silent with regard to the recited reward loss formula and both specific personalized rewards and common rewards.
	
Regarding Claims 3, 9, 13, and 19,
  the learning device has (i)    instructed the adjustment reward network to generate one or more second adjustment rewards corresponding to each of the common optimal actions to be performed at each of the timings of the driving trajectories by referring to the actual circumstance vectors,
(ii)    instructed the common reward module to generate one or more second common rewards corresponding to each of the common optimal actions to be performed at each of the timings of the driving trajectories by referring to the actual circumstance vectors, and
(iii)    instructed the estimation network, by referring to each of one or more virtual circumstance vectors corresponding to each of virtual circumstances caused by performing the common optimal actions at each of the timings of the driving trajectories, to generate one or more virtual prospective values corresponding to the virtual circumstances; and
    the learning device has instructed a second loss layer to generate at least one adjustment reward loss by referring to
(i)    each of second personalized rewards corresponding to each of the second adjustment rewards and each of the second common rewards,
(ii)    the virtual prospective values, and
(iii)    the actual prospective values, and to perform backpropagation by referring to the estimation loss, to thereby learn at least part of parameters of the estimation network.
	The closest prior art of record includes Hu which discloses a reinforcement learning algorithm with an actor-critic configuration and adjusted learning of rewards and losses. However, Hu is silent with regard to a personalized reward that corresponds to each of the adjustment rewards and each of the common rewards.
	In addition, Nguyen discloses a reinforcement learning system that includes a reward function in order to determine rewards for specific action. However, Nguyen is also silent with regard to a personalized reward that corresponds to each of the adjustment rewards and each of the common rewards.
	Furthermore, Mnih is a system that performs backpropagation by referring to a reward loss. However, Mnih is silent with regard to a personalized reward that corresponds to each of the adjustment rewards and each of the common rewards.

Dependent claims 4-8, 10, 14-18, and 20 are allowed as they depend upon an allowable independent claim.

Any comments considered necessary by the applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VASYL DYKYY whose telephone number is (571)270-5019.  The examiner can normally be reached on M-F 7:30 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/V.D./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122