Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on July 21, 2022, in which claims 1-2, 4-14 and 16-32 are currently amended.  Claim 33 is newly added.  Claims 1-33 are currently pending. 

Response to Arguments
Applicant’s arguments with respect to the interpretation to claims 1, 5, 8, 10-12, 17, 20 and 22-32 under 35 U.S.C. § 112(f) have been considered, however, have not been deemed persuasive.  Changing the term “adapted to” to “configured to” does not overcome the pairing of a nonce term with functional language which triggered the claim interpretation.  For this reason Examiner asserts that it is appropriate to maintain the interpretation under 35 U.S.C. § 112(f).
The rejections to claims 1-32 under 35 U.S.C. § 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-32 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations, all depending from system of claim 1, are as follows:
Claim 1: “an inner function computation module, configured to…”
Claim 1: “an error computation module, configured to…”
Claim 1: “a state update module, configured to…”
Claims 5/17: “a state to parameter mapping module, configured to…”
Claims 8/20/22-23: “a state change penalizing module, configured to…”
Claim 10: “a learning decision module, configured to…”
Claims 11/28/31: “a state combination module, configured to…”
Claims 12/29/32: “a learning strength penalizing module, configured to…”
Claims 24-27/30: “a learning decision module, configured to…”
Because these claim limitations are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
The corresponding structure of the noted modules configured to is read in light of the instant specification at [0023] which describes module comprising processor hardware or circuit and may be shared, dedicated, or group. Examiner interprets this as general computer.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 8-9, 14-18, and 20-23 are rejected under U.S.C. §103 as being unpatentable over the combination of Finn (“Meta-Learning and Universality: Deep Representations and Gradient Descent Can Approximate Any Learning Algorithm”, 2017) and Sung (“Learning to Learn: Meta-Critic Networks for Sample Efficient Learning”, 2017) and in further view of Schaal (“Learning Control in Robotics”, 2010).  

	Regarding claim 1, Finn teaches A meta-learning system, comprising: an inner function computation module, configured to compute output data from applied input data according to an inner model function, the inner model function depending on model parameters; ([p. 3 §3] "In model-agnostic meta-learning (MAML), instead of using an RNN to update the weights of the learner f, standard gradient descent is used. the prediction yˆ* for a test input x* is: y^* = f_MAML(D_T, x*;θ)...where θ denotes the initial parameters of the model f and also corresponds to the parameters that are meta-learned")
	an error computation module, configured to compute an error indicating a mismatch between the output data and a target value; and ([P.4 §4] “universal learning algorithm approximator corresponds to the ability of a meta-learner to represent any function ftarget(x,y,x*)… error gradient…” See also gradient equation.)
	a state update module, configured to update the model parameters of the inner model function according to an updated state, the update state being based on a current state and the error, (Eqs. 1-7 [Sect.4] for one-shot and/or k-shot [Sect.5], either of which are supported by loss/error as detailed [Sect.6] or proof of theorem per [Appendix A-B]. The claimed state corresponds with “gradient step” (replete) as step = state and variables are indexed subscript. The effect is described “perfect training accuracy… MAML-initialized model does not begin to overfit, even after 100 gradient steps” Fig 6 [P.19 App.E], [P.8] Fig 4).
	However, Finn does not explicitly teach the state update module being trained to adjust the model parameters before the inner model function is trained, the inner model function being trained to minimize a penalty, and the penalty being based on one of a change between the current state and the updated state, or a change between the error and a previous error  

Sung, in the same field of endeavor, teaches the state update module being trained to adjust the model parameters before the inner model function is trained, ([p. 4 §3] "the optimisation procedure is to alternatively update policy network and value network" [p. 6 §3.2] "Here we can see that the function approximator (actor) is learning to maximize the negative supervised learning loss, as estimated by the meta-critic, rather than minimise a fixed loss function as in regular SL. Meanwhile the meta-critic learns to simulate the actual supervised learning loss of each problem i = 1 . . . M." Critic/value network interpreted as synonymous with state update module.  Actor/task/policy network interpreted as synonymous with inner model. See also Algorithm 1 l. 11-12.). 

	Finn and Sung are both directed towards meta-learning, therefore, Finn and Sung are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Finn with the teachings of Sung by applying the features of meta-learning in Finn to the meta-reinforcement-learning model of Sung, particularly the actor/critic penalty/reward system.  Sung reinforces the obviousness of applying meta-learning to reinforcement-learning models, and further provides as a motivation for combination ([p. 2 §1] "To understand why the meta-critic approach is effective in RL, consider that if the meta-critic can correctly criticise a new task based on the provided task-encoding, then from the perspective of the new task’s actor, it benefits from a pre-trained critic which increases learning speed. Moreover, as the meta-critic is actor-conditional, it never gets ‘out of date’ and needing to ‘catch-up’ with its actor, as can happen during actor critic co-evolution in conventional actor-critic architectures.").  This motivation for combination also applies to the remaining claims which depend on the combination. 

	While it would be obvious to one of ordinary skill in the art that reward and penalty in reinforcement-learning are antonyms which can either be maximized or minimized to achieve similar results, the combination of Finn and Sung does not explicitly teach the inner model function being trained to minimize a penalty, and the penalty being based on one of a change between the current state and the updated state, or a change between the error and a previous error  

Schaal, in the same field of endeavor, teaches the inner model function being trained to minimize a penalty, and the penalty being based on one of a change between the current state and the updated state, or a change between the error and a previous error ([p. 25 Col. 2] "instead of the value function V(x), the action value function Q(x,u) can be used, which is defined as [See Eqn.] [7], [46]. Knowing Q(x,u) for all actions in a state allows choosing the one with the maximal (or minimal for penalty costs) Q-value as the optimal action. Q-learning can be conceived of as TD learning in the joint space of states and actions."). 

	The combination of Finn and Sung and Schaal are directed towards reinforcement-learning.  Therefore, the combination of Finn and Sung and Schaal are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Finn and Sung with the teachings of Schaal by minimizing a penalty in a reinforcement-learning model.  Schaal provides as an additional motivation for combination ([p. 27 Col. 1] "Learning was performed on a physical simulator of the robot dog, as the real robot dog was not available for this experiment. Figure 5 illustrates that after about 30 trials, the performance of the robot was significantly improved").  This motivation for combination also applies to the remaining claims depending on this combination. 

	Regarding claim 2, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 1, the state update module is learned using labelled learning data applied to adjust the model parameters; and (Finn [P.2 ¶5] “supervised classification” is labeled learning, see [P.5 ¶6] “When the training input x is passed in, we need fout to propagate information about the label y as defined in Equation 2”)
	the inner model function is trained using training data (Finn [P.7 ¶4] “back-propagating information about the label” per Eq.4). 

	Regarding claim 3, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 1, wherein the inner function computation module includes a neural network implementing the inner model function. (Finn [P.5 ¶6-7] “we will define fout as a neural network… and hpost(·;θh) is a neural network”, [P.2 Last¶] “MAML is compatible with any neural network”). 

	Regarding claim 4, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 3, wherein the model parameters of the neural network include weights and biases; and the state update module is configured to update the model parameters by changing the weights and biases according to the updated state. (Finn [P.4 Last¶] “bias transformation variable θb… gradient with respect to each weight matrix W”). 

	Regarding claim 5, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 1, further comprising: a state to parameter mapping module, configured to map the updated state to the model parameters according to a mapping function. (Sung [P.4 Sect.3] Eqs. 1-2 is mapping where θ is parameter/weight and “at each time t, an agent receives a state st… new state st+1”). 

	Regarding claim 6, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 5, wherein the state to parameter mapping module is configured to map the updated state to the model parameters according to a mapping function. (Sung [P.4 §3] Eqs.1-2 wherein the predetermined mapping function is argmax/argmin). 

	Regarding claim 8, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 1, wherein the penalty is a state change penalty; and the meta-learning system further comprises: a state change penalizing module, configured to associate the state change penalty with the change between the current state and the updated state. (Sung [P.4-5] per Eqs. 2 and 4 which compares difference between current state st and the updated state st+1 with penalty being reward of Q-network policy). 

	Regarding claim 9, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 8, wherein the inner model function is trained to minimize the errors, and the state change penalty. (Sung [P.4-5] per Eqs. 2 and 4 wherein “argmin” is minimization over states and with reward as error. A skilled artisan would have considered it obvious prior to the effective filing date to minimize error and state as in Sung because it leads to higher model accuracy - it would not be logical to maximize error). 

	Regarding claim 14, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 1, wherein the state update module is configured to update the state of state update module, depending on a gradient of the error, with respect to the model parameters. (Finn [P.2 ¶5] “gradient steps on θ” is a gradient with respect to parameters where step is state and updates may be few-shot multi-step. Additional updates may be of the form z per Eqs. 1-3 [P.5] and self-connection is considered with regard to recurrent neural networks [P.3]). 

	Regarding claim 15, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 3, wherein the inner function computation module includes a deep neural network implementing the inner model function. (Finn Fig 1 illustrates, function is “deep fully-connected neural network” as described throughout, e.g., [P.4 ¶3], [P.5 ¶5], [P.18 Sect.D ¶1]). 

	Regarding claim 16, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 15, wherein the model parameters of the deep neural network include weights and biases;  and the state update module is configured to update the model parameters by changing the weights and biases according to the updated state. (Finn [P.4 Last¶] “bias transformation variable θb… gradient with respect to each weight matrix W”). 

	Regarding claims 17-23, claims 17-21 are substantially similar to claims 5-9, respectively and claims 20 and 22-23 are all substantially similar to claim 8.  Therefore, the rejections applied to claims 5-9 also apply to claims 17-23.  

	Claims 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Finn, Sung, and Schaal and in further view of Wang (“Learning to Model the Tail”, 2017).

	Regarding claim 7, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 6.
	However, the combination of Finn, Sung, and Schaal does not explicitly teach, wherein the mapping function is an identity function.  

Wang, in the same field of endeavor, teaches The meta-learning system of claim 6, wherein the mapping function is an identity function. ([P.2 ¶2] “meta-network that simply acts as an identify function, returning the input set of model parameters” again at [P.5 Sect3.2] “meta-learner defaults to the identity function”). 

	The combination of Finn, Sung, and Schaal, as well as Wang are directed towards meta-learning.  Therefore, the combination of Finn, Sung, and Schaal as well as Wang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Finn, Sung, and Schaal with the teachings of Wang by ensuring that the mapping function is an identity function. Wang provides as an additional motivation for combination ([P.7 Last¶], [P.2 ¶2] “identity regularization provide a noticeable performance boost” thus “effectively capturing the gradual dynamics of transferring meta-knowledge from data-rich to data-poor regimes”).  
	Regarding claim 19, claim 19 is substantially similar to claim 7, therefore, the rejection applied to claim 7 also applies to claim 19.  

Claims 10-13, and 24-33 are rejected under U.S.C. §103 as being unpatentable over the combination of Finn, Sung, and Schaal and in further view of Schweighofer (“Meta-learning in Reinforcement Learning”, 2002).

	Regarding claim 10, the combination of Finn, Sung, and Schaal teaches The meta-learning system of claim 1.
	While Sung explicitly teaches changing a learning rate of a model ([p. 2 §1]), the combination of Finn, Sung, and Schaal does not explicitly teach a learning decision module, configured to compute a learning strength based on the error.  

Schweighofer, in the same field of endeavor, teaches a learning decision module, configured to compute a learning strength based on the error. ([p. 1 §1] "Therefore, any deviation from the consistency equation [See Eqn.] should be zero on average. This signal is the TD error and is used as the teaching signal to learn the value function: where a is a learning rate." learning rate interpreted as synonymous with learning strength.). 

The combination of Finn, Sung, and Schaal, as well as Schweighofer are directed towards meta-learning.  Therefore, the combination of Finn, Sung, and Schaal as well as Schweighofer are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Finn, Sung, and Schaal with the teachings of Schweighofer by modifying the learning rate parameter of the model based on the error.  Schweighofer provides as an additional motivation for combination ([p. 6 §1.2] "The model improves learning performance by automatically setting near optimal synaptic learning rates for each synapse").  This motivation for combination also applies to the remaining claims which depend on this combination. 

	Regarding claim 11, the combination of Finn, Sung, Schaal, and Schweighofer teaches The meta-learning system of claim 10, further comprising: a state to parameter mapping module configured to map the updated state to the model parameters; and (Sung [P.4 §3] Eqs.1-2 wherein the predetermined mapping function is argmax/argmin)
	a state combination module, configured to output one of the current state or the updated state, to the state to parameter mapping module based on the learning strength. (Schweighofer [p. 1 §1] "The policy is usually defined via the action value function...which represents how much future rewards the agent would get by taking the action a at state x(t) and following the current policy in subsequent steps. One common way for stochastic action selection that encourages exploitation is to compute the probability to take an action by the soft-max function: [See Eqn.]" Schweighofer shows that the action probability related to a particular state change is based on the learning rate.). 

	Regarding claim 12, the combination of Finn, Sung, Schaal, and Schweighofer teaches The meta-learning system of claim 11, wherein the penalty is a learning strength penalty; and (Schweighofer [p. 1 §1] "The policy is usually defined via the action value function...which represents how much future rewards the agent would get by taking the action a at state x(t) and following the current policy in subsequent steps. One common way for stochastic action selection that encourages exploitation is to compute the probability to take an action by the soft-max function: [See Eqn.]" Schweighofer shows that the action probability and related reward is based on the learning rate.  Schaal explicitly teaches that the penalty and reward are synonymous.  It would be obvious to one of ordinary skill in the art to combine the two references to teach a learning strength penalty, and would lead to obvious and expected results.)
	the meta-learning system further comprises a learning strength penalizing module, configured to associate the learning strength penalty with a current magnitude of the learning strength. (Schaal [p. 27 Col. 1] "Path-integral reinforcement learning primarily used the forward progress as a reward and slightly penalized the squared acceleration of each DoF and the squared norm of the parameter vector" Penalized squared norm of the parameter vector interpreted as synonymous with associating a penalty with a magnitude of the learning strength.  Schweighofer explicitly teaches that the learning strength or learning rate is part of the parameter vector and it would be obvious to combine the regularized penalization in Schaal with the parameter vector of Schweighofer.  This would lead to obvious and expected results.). 

	Regarding claim 13, the combination of Finn, Sung, Schaal, and Schweighofer teaches The meta-learning system of claim 12, wherein the inner model function is trained to minimize the errors and the learning strength penalty. (Schaal [p. 25 Col. 2] "instead of the value function V(x), the action value function Q(x,u) can be used, which is defined as [See Eqn.] [7], [46]. Knowing Q(x,u) for all actions in a state allows choosing the one with the maximal (or minimal for penalty costs) Q-value as the optimal action. Q-learning can be conceived of as TD learning in the joint space of states and actions."). 

Regarding claims 24-27, 30, and 33.  Claims 24-27, 30, and 33 are substantially similar to claim 10.  Therefore, the rejection applied to claim 10 also applies to claims 24-27, 30, and 33.  

Regarding claims 28-29 and 31-32, claims 28 and 31 are substantially similar to claim 11, and claims 29 and 32 are substantially similar to claim 12. Therefore, the rejections applied to claim 11 also apply to claims 28 and 31, and the rejections applied to claim 12 also apply to claims 29 and 32. 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124