Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1,4-8,11-15,18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Shedivat et al (“Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments”, ICLR 2/23/2018, pp 1-21) in view of Liu (20200279155).

As per claim 1, Al-Shedivat et al (“Continuous Adaptation…”) teaches a method for training a reinforcement learning neural network model (as MAML in reinforcement learning setting – fig. 1a), comprising:
receiving a distribution including a plurality of related tasks (as distribution of tasks – pp3, section 3.1); 
training parameters for a reinforcement learning neural network model based on gradient estimation associated with the parameters using samples associated with the plurality of related tasks (as gradient estimates of MAML constructs – pp 3last 5 lines and the following equation with the gradient (“del” – updside down triangle) operator, 
wherein control variates are incorporated into the gradient estimation by automatic differentiation (as automatic gradient-del calculations – pp4, algorithm 1, and up to section 3.2, with the gradient-del operator).
	
As per claim 1, Al-Shedivat et al (“Continuous Adaptation…”) further teaches gradient calculation on differing levels (ie, more than one gradient – see pp 5, first half, including text following equation 8 – and further details below; as well as, these gradient have control variates tied to the dynamics of the task on one level, and the tasks themselves on another level; and hence reads on the current claim scope of a first and second gradient); however, does not explicitly teach some gradients having control variates, and other gradients not having a control variate.  Liu (20200279155) teaches a ‘blending’ of gradients, one of which has a control variate, while the other gradient does not (para 0023, and in further detail, para 0033).  Therefore, it would have been obvious to one of ordinary skill in the art of ML neural network processes to modify the control variate in the gradient calculations of Al-Shedivat et al (“Continuous Adaptation…”) with a blended gradient (containing gradient with and without control variates) as taught by Liu (20200279155) because it would advantageously improve upon the variance reduction (Liu (20200279155) , para 0023, 0024).
  
As per claim 4, the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) teaches the method of claim 1, wherein the method further comprises: training the control variates to generate trained control variates; and incorporating the trained control variates into the gradient estimation (Al-Shedivat et al (“Continuous Adaptation…”)as meta-learning/training – pp5, lines 1-7, performing meta gradient steps – pp 5, starting “To construct….”).

As per claim 5, the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) teaches the method of claim 4, wherein the training the control variates includes: for each task of the plurality of related tasks, training a separate control variate (Al-Shedivat et al (“Continuous Adaptation…”)as adaptation/training, performing a meta-learning process for each transition – pp 5, first 8 lines).

As per claim 6, the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155)  teaches the method of claim 5, wherein the training the control variates includes: training the control variates across the plurality of related tasks – Al-Shedivat et al (“Continuous Adaptation…”),pp5, see meta-learning at training time, across the tasks (P(Ti-1, Ti) defines as task dynamics and particular tasks – Al-Shedivat et al (“Continuous Adaptation…”),pp 5, first 6 lines.

As per claim 7, the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) teaches the method of claim 6, wherein the training the control variates across the plurality of related tasks includes: receiving a plurality of episodes corresponding to the plurality of related tasks; (Al-Shedivat et al (“Continuous Adaptation…”)as receiving nonstationary distribution of tasks – pp4, section 3.2, first 6 lines, from “In the….changing environments”;
 adapting meta control variates parameters for control variates using a first portion of the episodes associated with a first portion of the tasks to generate first adapted meta control variates parameters; generating first meta control variates estimates associated with a second portion of the tasks based on the first adapted meta control variates parameters; (Al-Shedivat et al (“Continuous Adaptation…”)as meta-learning of tasks, -- pp5, first 6 lines and equation, wherein “one portion” is the upper level (task dynamics) and the second portion is the lower level (particular tasks);
 adapting the meta control variates parameters using a second portion of the episodes associated with the second portion of the tasks to generate second adapted meta control variates parameters; generating second meta control variates estimates associated with the first portion of the tasks based on the second adapted meta control variates parameters; (Al-Shedivat et al (“Continuous Adaptation…”)see pp5, equation 7, showing the multi-gradient steps with adaptive steps) 
and updating the meta control variates parameters using the first and second meta control variates estimates (Al-Shedivat et al (“Continuous Adaptation…”)as updating the overall policy gradient – pp5, from 2 lines before equation 7, to 5 lines past equation 8).

	Claims 8,11-14 are non-transitory machine readable medium claims performing the method steps of claims 1,4-7 above and as such, claims 8,11-14 are similar in scope and content to claims 1,4-7 above and therefore, claims 8,11-14 are rejected under similar rationale as presented against claims 1,4-7 above.  Furthermore, Al-Shedivat et al (“Continuous Adaptation…”) teaches performing the equations on a computing device to generate the graphs – pp 10-11, starting with section 5.3.

	Claims 15,18-20 are device claims that perform the method steps of claims 1,4-7 above and as such, claims 15,18-20 are similar in scope and content to claims 1,4-7 above and therefore, claims 15,18-20 are rejected under similar rationale as presented against claims 1,4-7 above.  Furthermore, Al-Shedivat et al (“Continuous Adaptation…”) teaches performing the equations on a computing device to generate the graphs – pp 10-11, starting with section 5.3.


Claims 2,9,16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155)  in further view of Alakuijala et al (20200142888).

As per claim 2, the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) teaches a gradient estimator by automatic differentiation and removing bias, as applied to claim 1 above, however, does not explicitly teach using a Monte Carlo type algorithm; however, Alakuijala et al (20200142888) teaches a Monte Carlo calculation in a similar space (para 0070).  Therefore, it would have been obvious to one of ordinary skill in the art of machine learning network design to modify the gradient calculation of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155)  with a Monte Carlo calculation as taught by Alakuijala et al (20200142888), because it would advantageously improve the reword score because of updating during the interaction ( Alakuijala et al (20200142888), para 0017)  

	Claims 9,16 are non-transitory machine readable medium and device claims, respectively, that perform the method step of claim 2 above and as such, claims 9,16 are similar in scope and content to claim 2 above and therefore, claims 9,16 are rejected under similar rationale as presented against claim 2 above.  Furthermore, Al-Shedivat et al (“Continuous Adaptation…”) teaches performing the equations on a computing device (including executing the disclosed equations requiring a storage medium executing program steps) to generate the graphs – pp 10-11, starting with section 5.3. 

Claims 3,10,17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) in view of Desjardins (20190236482).

As per claim 3, the combination of Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) teaches the method of claim 1, of automatic differentiation (as applied to claim 1 above), however, does not explicitly using a double differentiating (differentiating the gradient, also known as the trace of a Hessian); however, Desjardins (20190236482) teaches the concept of performing a second derivative of the objective function (para 0040). Therefore, it would have been obvious to one of ordinary skill in the art of neural network machine learning design to add a further differentiation step in Al-Shedivat et al (“Continuous Adaptation…”) in view of Liu (20200279155) because it would further optimize the objection function with respect to the task ( Desjardins (20190236482), para 00040). 

Claims 10,17 are non-transitory machine readable medium and device claims, respectively, that perform the method step of claim 3 above and as such, claims 10,17 are similar in scope and content to claim 2 above and therefore, claims 10,17 are rejected under similar rationale as presented against claim 2 above.  Furthermore, Al-Shedivat et al (“Continuous Adaptation…”) teaches performing the equations on a computing device (including executing the disclosed equations requiring a storage medium executing program steps) to generate the graphs – pp 10-11, starting with section 5.3.

Response to Arguments

Applicant’s arguments with respect to the claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  Examiner notes the use of the Liu (20200279155) reference teaching a blending of gradient types to use variate control gradients and non-variate control gradients.  Applicant repeats the teachings of Al-Shedivat et al (“Continuous Adaptation…”) but does not proffer a compare/contrast of the citings of the Shedivat in relation to the amended claim scope; which is now rendered moot based upon the newly formed combination of Al-Shedivat et al (“Continuous Adaptation…”) and Liu (20200279155).

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Examiner notes the following references, teaching relevant aspects towards reinforcement learning and gradients across datasets:
Virkar et al (20100063948) abstract, paras 0070, 0086, 0149-0151
Varadarajan et al (20190095818) abstract, Fig. 1, para 0089, 0097-0098, 0124-0126, 0158-0163
Singh et al (20200074296) para 0016, 0035, 0069-0071 
N et al (20190130292) fig 13, fig 17 (monte carlo).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        11/16/2022