PNG
    media_image1.png
    327
    1870
    media_image1.png
    Greyscale

    PNG
    media_image1.png
    327
    1870
    media_image1.png
    Greyscale




P.O. Box 1450, Alexandria, Virginia 22313-1450 – WWW.USPTO.GOV


   Examiner’s Detailed Office Action   

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Sabine M. Volkmer Ward, Reg. No. 66,559, on 07/26/2021.

1.	A system, comprising:	a communications interface;	one or more processing unit(s); and	one or more computer-readable media having thereon computer-executable instructions, the computer-executable instructions, upon execution, causing the one or more processing unit(s) to perform operations for coordinated training and operation of computational models, the operations comprising:	operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value;	operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions;

5.	A system as recited in claim 1, wherein the selected action corresponds to a highest expectation value of the expectation values.

6.	A system as recited in claim 1, wherein:	the first RNN computational model is further operated to provide a predicted observation value and the first RNN computational model is trained further based on the predicted observation value and the second observation value.

7.	A system as recited in claim 6, further comprising a sensor coupled to the communications interface and configured to provide the second observation value.

8.	A system as recited in claim 1, further comprising an actuator coupled to the communications interface and responsive to the indication of the selected action to perform the selected action.

9.	A system as recited in claim 1, further comprising a result subsystem coupled to the communications interface and configured to provide the reference result value.

10.	A method for coordinated training and operation of computational models, the method comprising:	operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value;	operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions;	selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface;	receiving a first reference result value and a second observation value via the communications interface;	training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model;	operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and	training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model.

12.	A method as recited in claim 10, wherein the first observation value comprises a sensor reading.

13.	A method as recited in claim 10, further comprising:	operating the first RNN computational model to further provide a predicted observation value; and	training the first RNN computational model further based on the predicted observation value and the second observation value to provide the second RNN computational model.

14.	A method as recited in claim 10, wherein the receiving the second observation value is performed after the providing the indication.

16.	A non-transitory computer-readable medium having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations for coordinated training and operation of computational models, the operations comprising:	operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value;	operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions;	selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface;	receiving a first reference result value and a second observation value via the communications interface;	training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model;	operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and	training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model.

20.	A non-transitory computer-readable medium as recited in claim 16, wherein one or more of the plurality of values of the training data comprise respective sensor readings.

End of Claim Amendments

	
1.	Claims 1, 5-10, 12-14, 16 and 20 are allowed.	
  			           REASONS FOR ALLOWANCE
2.	The following is an Examiner’s statement for reasons for allowance: 

3.	Claims 1, 5-10, 12-14, 16 and 20 are considered allowable since when reading the claims in light of the specification, as per, MPEP §2111.01 or Toro Co. v. White Consolidated Industries Inc., 199 F.3d 1295, 1301, 53 USPQ2d 1065, 1069 (Fed. Cir. 1999), none of the references of record alone or in combination disclose or suggest the combination of limitations specified in the independent claim(s).
4.	The limitations recited in independent claims 1“…using a supervised-learning update rule to train the first RNN computational model based at least in part on the predicted result value and a corresponding reference result value received via the communications interface in response to the indication to provide a second RNN computational model;
operating the second RNN computational model on the first state information and a second observation value received via the communications interface in response to the indication to provide
second state information and a second predicted result value; and
using a reinforcement-learning update rule to train the first QN computational model based at least in part on the first state information, the second state information, the reference result value, and the selected action to provide a second QN computational model.” 
5.	The limitations recited in independent claims 10“…training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model;
operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and
training the first QN computational model, using a reinforcement-learning update rule, based
at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model.”
6.	The limitations recited in independent claims 16“…training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model;
operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and
training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model.”
7.  For claims 1, 5-10, 12-14, 16 and 20, the prior art by Wang et al. (US 2017/0140266A1) does teach  a QN, trained by reinforcement learning, that provides expectation (Q) values to select actions.  However there is no prior art to cover the claim limitations recited above.
8.	When taken in context the claim(s) as a whole was/were not uncovered in the prior art

i.e., the dependent claims are allowed as they depend upon an allowable independent claim.

7.	Any comments considered necessary by applicant must be submitted no later than the

payment of the issue fee and, to avoid processing delays, should preferably accompany the 

issue fee. Such submissions should be clearly labeled “Comments regarding Statement of 

Reasons for Allowance.”


Correspondence Information
9.  Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABABACAR SECK whose telephone number is (571)270-7146.  The examiner can normally be reached on Monday-Friday 8:00 A.M.-6:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 5712723719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ABABACAR SECK/Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122