DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
Acknowledgement is made of Applicant's claim amendments on 8/11/2022. The claim amendments are entered. Presently, claims 1-20 remain pending. Claims 1, 2, 4, 13, 15, 16, 19 and 20 have been amended.

Applicant has sufficiently amended claims 4, 13, and 20 to address the claim objections. Accordingly, the claim objections are withdrawn. 

Applicant has amended claims 15, 16, 19, and 20 to remove the terms that invoke the §112(f) interpretations. Accordingly, the §112(f) interpretations are withdrawn. 
Additionally, Applicant’s removal of these terms also no longer triggers a structural determination for these terms and thus, a written description regarding structural support for these terms is no longer needed. Accordingly, the §112(a) rejections in relation to the §112(f) interpretations are withdrawn. 
Furthermore, the indefiniteness issue regarding the lack of sufficient structure to support these terms is also no longer need since the terms at issue have been removed. Accordingly, the §112(b) rejections in relation to the §112(f) interpretations are withdrawn. 

Applicant has sufficiently amended the various claims to address the antecedent basis issues and to provide the requisite clarity for claim 5. Accordingly, the §112(b) rejections against the various claims are withdrawn. However, the §112(b) rejection against claim 13 has not been withdrawn because the antecedent issue at step (iv) has not been fixed. It is noted that Applicant had stated that this issue was fixed (see Applicant’s reply pg. 12), so it is clear that Applicant intended to fix this issue, but there was an oversight and it ended up not being fixed in the final submission. Thus, the 112(b) rejection against claim 13 is maintained. The antecedent issue at step (iv) can simply be addressed in a next response. 

Response to Arguments
Applicant's arguments filed on 8/11/2022 have been fully considered and are addressed below.

Applicant argues that the claims allegedly do not recite mental processes because the claims have been amended to incorporate the use of processors with instructions, memory, and computer code instructions (Applicant’s reply pgs. 12-13). Applicant also argues that the claims are allegedly drawn to a technological improvement related to neural network training and recites the specification, various caselaw, and various claim elements regarding comparing steps to generate slow weight value and information value, parameterizing to update weights via meta learning, and classification of input task using the various steps as enabling more efficient training of neural network (Applicant’s reply pgs.12-13). 
First, the argument regarding incorporating the use of processors with instructions, memory, and computer code instructions as allegedly sufficient to overcome mental processes is not persuasive. The recitation of the memory and processor with computer code instructions merely denote an additional element of mere instructions for applying the judicial exception via generic computing tools such as the memory and processor with computer code instructions (see MPEP 2106.05(f)), while also denoting a generic computing environment (see MPEP 2106.05(h)). Wherein the use of generic computing tools to perform the mental process does not amount to anything more than the use of the generic computing environment to perform the mental processes (see MPEP 2106.04(a)(2)(III)(C)). Thus, the incorporation of the processors with instructions, memory, and computer code instructions do not integrate into a practical application and do not amount to significantly more than the judicial exception. As such, this argument is not persuasive. 
Second, the argument regarding the technological improvement has been considered and evaluated. Independent claims 1, 4, and 13 recite the various steps as identified by Applicant with the exception that classifying is only present in claim 1. However, claim 15 does not recite these elements. Rather, claim 15 just recites meta and base learners with a weight memory device for generating and storing fast and slow weights. 
Thus, a further review of claims 1, 4, and 13 in light of the specification lends support that the claim limitations when read as a whole provide a technological improvement, namely an improvement in meta learning accuracy for neural networks. Thus, these independent claims are sufficient to overcome the §101 rejection. Accordingly, the §101 rejection is withdrawn for these claims and their dependent claims. 
Regarding claim 15, as stated above, claim 15 differs from the other claims and does not contain the various limitations describing the meta learning process in detail, wherein such details help to explain the process and why it would enable the claimed invention as recited in the detailed process to be a technological improvement. In contrast, claim 15 merely recites one-shot learning of a neural network in the preamble, which for the reasons stated below does not have patentable weight, and meta and base learners with a weight memory device for generating and storing fast and slow weights. That is, the claim recites no details of the one-shot learning and meta learning process as described in the specification that is purported to provide a technological improvement on the status quo. As such, claims 15 and its relevant dependent claims do not concretely demonstrate that it represents a technological improvement or is integrable into a practical application. Furthermore, as noted above, the incorporation of the processors with instructions, memory, and computer code instruction into claim 15 is an additional element reciting the use of generic computing tools, wherein the use of generic computing tools to perform the mental process does not amount to anything more than the use of the generic computing environment to perform the mental processes (see MPEP 2106.04(a)(2) (III)(C)). As such, claims 15 and its relevant dependent claims remain rejected under §101. 

Applicant argues that the Jeng allegedly does not teach support examples and describes that the meta information as allegedly does not teach support examples (Applicant’s reply pg. 14). This argument is not persuasive. The data domain set, not the meta information, was used to teach the support set of examples. 

Applicant argues that Jeng in combination with the other cited references allegedly do not teach weight memory devices and that Jeng allegedly does not teach memory (Applicant’s reply pgs. 14-15). This argument is not persuasive. Jeng is not being used to teach the weight memory devices. Additionally, Jeng does teach memory devices because the computers and servers include memory storage components/devices. Thus, these arguments are not persuasive. 

Allowable Subject Matter
The allowable subject matter was previously identified and is being updated and maintained. 

Claims 1-3 are allowable.
The following is a statement of reasons for the indication of allowable subject matter: the prior art references do not explicitly teach the claim limitations.
Li teaches a meta-learner with a “two-level learning process” for classification task learning in a “meta space”. Wherein the meta-learning process takes in a plurality of data batches to each meta-learner utilizing few-shot learning techniques. The process involves learning meta parameters, calculating empirical loss via gradient descent, and calculating prediction loss via mean squared error. The process also utilizes weight matrices that are initialized with a certain mean and standard deviation value. However, Li does not explicitly teach the generation of the first and second meta weight values, the comparison of those weight values to generate a slow weight value, storing the slow weight value in memory, the comparison steps to generate a third meta information and a fast weight, the transmission of the third meta information, the storing of the fast weight in memory, and the parameterizing of the meta weight values with the fast weight to update the slow weight. As such, the claim limitations are distinguishable from the reference. 
Gupta teaches a “multipart artificial neural network” (ANN) for meta-learning to classify questions. Input data can be received by the ANN to train the ANN on auxiliary tasks to assign labels to the input and to make predictions related to relationships between the words. The meta-learning process uses supervised learning to make a prediction in a hypothesis space that includes computing a loss function with regards to the prediction. The process also includes calculating weight values for the ANN and its nodes. In addition, Gupta also teaches a memory device for storing information. However, Gupta does not explicitly teach the generation of the first and second meta weight values, the comparison of those weight values to generate a slow weight value, storing the slow weight value in memory, the comparison steps to generate a third meta information and a fast weight, the transmission of the third meta information, the storing of the fast weight in memory, and the parameterizing of the meta weight values with the fast weight to update the slow weight. As such, the claim limitations are distinguishable from the reference.
Vilalta teaches data classification via a classification system that uses a meta-learning process that includes model selection and generation of meta-rules and meta-features. The system receives training domain data set and generates meta-features from the domain data set. Then, “the meta-learning model selection process 700 attempts to look for correlations among the meta-features to generate rules that show when to assign a specific model to a certain domain. The selection process 800 identifies the best matching rule to assign a model to a new domain. The meta-rules generation process 900 looks for correlations between meta features and models to determine when a model is best for a specific domain dataset.” The meta-learning process also involves calculating weighted distances between the training sample data sets to measure the variability of the class label associated with the training sample data sets. In addition, Vilalta also teaches a memory device for storing instructions. However, Vilalta does not explicitly teach the generation of the first and second meta weight values, the comparison of those weight values to generate a slow weight value, storing the slow weight value in memory, the comparison steps to generate a third meta information and a fast weight, the transmission of the third meta information, the storing of the fast weight in memory, and the parameterizing of the meta weight values with the fast weight to update the slow weight. As such, the claim limitations are distinguishable from the reference.
Ba teaches training of a neural network with classification tasks, generation of slow and fast weights in a neural network, and a storing or caching of the slow and fast weights in relation to memory. However, Ba does not explicitly teach the meta-learning such as the meta space, the generation of the first and second meta information values, the generation of the first and second meta weight values, the comparison of those weight values to generate a slow weight value, the comparison steps to generate a third meta information and a fast weight, the transmission of the third meta information, the comparison steps to generate a third meta information and a fast weight, the transmission of the third meta information, and the parameterizing of the meta weight values with the fast weight to update the slow weight. As such, the claim limitations are distinguishable from the reference.

Claims 4-12 are allowable. Claims 13 and 14 would be allowable if rewritten or amended to overcome the issues as set forth in this Office action.
The following is a statement of reasons for the indication of allowable subject matter: the prior art references do not explicitly teach the claim limitations. 
Li teaches a meta-learner with a “two-level learning process” for classification task learning in a “meta space”. Wherein the meta-learning process takes in a plurality of data batches to each meta-learner utilizing few-shot learning techniques. The process involves learning meta parameters, calculating empirical loss via gradient descent, and calculating prediction loss via mean squared error. The process also utilizes weight matrices that are initialized with a certain mean and standard deviation value. However, Li does not explicitly teach the first and second meta weights, generation of the representation loss and the representation loss gradients, generating the first fast weight, generating the task loss and the task loss gradient, mapping the task loss gradient through parameterizing by a second meta weight, integration via parameterizing of the respective slow weights and fast weights, generating a third fast weight via reading the weight memory with soft attention, generating the training loss via parameterizing  with integration of the second slow and fast weights, and updating the various weights using the training loss and loss gradient associated with the respective weights. As such, the claim limitations are distinguishable from the reference.
Duan teaches a meta-learning framework that utilizes a one-shot imitation learning technique. The technique involves using a soft attention module with attention weights and inputs comprising a query, context vectors, and memory vectors. The outputs comprise “a weighted combination of the memory content, where the weights are given by a softmax operation over the attention weights.” Duan also teaches computing a cross-entropy loss related to an action and a demonstration of a task. However, Duan does not explicitly teach the first and second meta weights, generation of the representation loss and the representation loss gradients, generating the first fast weight, generating the task loss and the task loss gradient, mapping the task loss gradient through parameterizing by a second meta weight, integration via parameterizing of the respective slow weights and fast weights, generating a third fast weight, generating the training loss via parameterizing with integration of the second slow and fast weights, and updating the various weights using the training loss and loss gradient associated with the respective weight weights. As such, the claim limitations are distinguishable from the reference.
Gupta teaches a “multipart artificial neural network” (ANN) for meta-learning to classify questions. Input data can be received by the ANN to train the ANN on auxiliary tasks to assign labels to the input and to make predictions related to relationships between the words. The meta-learning process uses supervised learning to make a prediction in a hypothesis space that includes computing a loss function with regards to the prediction. The process also includes calculating weight values for the ANN and its nodes. In addition, Gupta also teaches a memory for storing information. However, Gupta does not explicitly teach the first and second meta weights, generation of the representation loss and the representation loss gradients, generating the first fast weight, generating the task loss and the task loss gradient, mapping the task loss gradient through parameterizing by a second meta weight, integration via parameterizing of the respective slow weights and fast weights, generating a third fast weight via reading the weight memory with soft attention, generating the training loss via parameterizing  with integration of the second slow and fast weights, and updating the various weights using the training loss and loss gradient associated with the respective weight weights. As such, the claim limitations are distinguishable from the reference.
Vilalta teaches data classification via a classification system that uses a meta-learning process that includes model selection and generation of meta-rules and meta-features. The system receives training domain data set and generates meta-features from the domain data set. Then, “the meta-learning model selection process 700 attempts to look for correlations among the meta-features to generate rules that show when to assign a specific model to a certain domain. The selection process 800 identifies the best matching rule to assign a model to a new domain. The meta-rules generation process 900 looks for correlations between meta features and models to determine when a model is best for a specific domain dataset.” The meta-learning process also involves calculating weighted distances between the training sample data sets to measure the variability of the class label associated with the training sample data sets. In addition, Vilalta also teaches a memory for storing instructions. However, Vilalta does not explicitly teach the first and second meta weights, generation of the representation loss and the representation loss gradients, generating the first fast weight, generating the task loss and the task loss gradient, mapping the task loss gradient through parameterizing by a second meta weight, integration via parameterizing of the respective slow weights and fast weights, generating a third fast weight via reading the weight memory with soft attention, generating the training loss via parameterizing  with integration of the second slow and fast weights, and updating the various weights using the training loss and loss gradient associated with the respective weight weights. As such, the claim limitations are distinguishable from the reference.
Ba teaches training of a neural network with classification tasks, generation of slow and fast weights in a neural network, and a storing or caching of the slow and fast weights in relation to memory. However, Ba does not explicitly teach the first and second meta weights, generation of the representation loss and the representation loss gradients, generating the task loss and the task loss gradient, mapping the task loss gradient through parameterizing by a second meta weight, integration via parameterizing of the respective slow weights and fast weights, generating a third fast weight via reading the weight memory with soft attention, generating the training loss via parameterizing  with integration of the second slow and fast weights, and updating the various weights using the training loss and loss gradient associated with the respective weight weights. As such, the claim limitations are distinguishable from the reference.

Claim 16 would be allowable if rewritten to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  claim 16 recites that the meta and base learners cooperatively integrate a first slow weight and a first fast weight using an augmentation layer approach, wherein an input of an augmentation layer is first transformed by the slow and fast weights, then passed through a non-linearity resulting in separate activation vectors, then the activation vectors are aggregated by an element-wise vector addition. 
Jeng teaches the meta learner and base learners, but it does not explicitly teach an integration of the slow and fast weights using an augmentation layer approach, and the transformation, passing, and aggregating of the vectors via element-wise addition. 
Vinyals teaches meta-learning, non-linearity via rectified linear unit (ReLU) activation, and memory augmented neural networks, but it does not explicitly teach base learner, slow and fast weights, an integration of the slow and fast weights using an augmentation layer approach, and the transformation, passing, and aggregating of the vectors via element-wise addition. 
Ba teaches slow and fast weights, augmentation layer approach, non-linearity via rectified linear unit (ReLU) activation, vectors, and vector addition, but it does not explicitly teach meta and base learners, an integration of the slow and fast weights using an augmentation layer approach, that an input is first transformed by the slow and fast weights before being passed through the non-linearity to result in separate activation vectors, and then followed by an aggregation of those activation vectors via element-wise vector addition.
Therefore, the claim limitations are distinguishable from the cited references. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 13 and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 13 recites the limitation “the task-dependent input representation” in step (iv), There is insufficient antecedent basis for this limitation in the claim. It is likely that Applicant intended to recite: “the first task-dependent input representation”. 
Claim 14 is rejected by virtue of its dependency from claim 13 and because it does not provide further clarification on the issue.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15 and 17-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 for all claims
	Under the first part of the analysis, claims 15 and 17-19 recite a system. Accordingly, these claims fall within the four statutory categories and the analysis now proceeds to Step 2A, Prongs 1 and 2 and then Step 2B.
 
Claim 15
Step 2A, prong 1: the following limitations recite mental processes: 
“(ii) generate one or more fast weights, and 
(iii) optimize one or more slow weights…, based on the one or more fast weights and a training set of examples”. 
The above limitations describe mental processes because, under a broadest reasonable interpretation (BRI), they involve: generating fast weights and optimizing slow weights. Thus, the claim recites mental processes based on observations, evaluations, judgments, or opinions that are performable in the human mind or with the aid of pencil and paper (see MPEP 2106.04(a)(2)(III)). Indeed, one can mentally or with the aid of pencil and paper generate data comprising weights via observations, evaluations, judgments, or opinions. Likewise, one can mentally or with the aid of pencil and paper optimize weights based on other data such as fast weights and training sets via observations, evaluations, judgments, or opinions. As such, these limitations denote mental processes.
Step 2A, prong 2: the following limitations recite additional elements:
“A system for facilitating one-shot learning in neural network, comprising: 
a) a meta learner; 
b) a base learner operatively coupled to the meta learner, the meta learner and base learner implemented by a processor and an instruction memory with computer code instructions stored thereon, the instruction memory operatively coupled to the processor such that, when executed by the processor, the computer code instructions cause the system cooperatively (i) acquire meta information from a support set of examples, 
… used by the base learner …; and 
c) a weight memory device operatively coupled to the meta learner and the base learner, the meta learner and base learner being configured to cooperatively store the one or more slow weights and the one or more fast weights in the weight memory device.”
While the preamble recites “facilitating one-shot learning in a neural network”, this is insufficient to integrate the claim limitations into a practical application because, aside from this statement in the preamble, the claim limitations do not describe a neural network and one-shot learning technique. Rather, the limitations describe an acquiring step, a generation step, an optimization step, and a storing step. As such, the limitations do not provide a concrete tying between how these steps relate to or denote the facilitation of one-shot learning in the neural network. Indeed, the limitations as recited might not have any relation to a neural network or one-shot learning in a neural network at all because one can apply the limitations simply to model various data and to acquire, generate, optimize, and store data associated with various weights. Thus, when reading the preamble in the context of the entire claim, the recitation “facilitating one-shot learning in a neural network” is not limiting because the body of the claim describes a complete invention and the language recited solely in the preamble does not provide any distinct definition of any of the claimed invention’s limitations. Therefore, the preamble of the claim(s) is not considered a limitation and is of no significance to claim construction. See Pitney Bowes, Inc. v. Hewlett-Packard Co., 182 F.3d 1298, 1305, 51 USPQ2d 1161, 1165 (Fed. Cir. 1999). See MPEP § 2111.02. As such, the recitation of “facilitating one-shot learning in a neural network” in the preamble does not integrate the claim into a practical application.
The limitation reciting the meta and the base learners denote additional elements related to mere instructions for applying the judicial exception on a generic computing system such as the meta and the base learners (see MPEP 2106.05(f)). 
The limitation describing the implementing via processors and instruction memory with computer code instructions also denote additional elements related to mere instructions for applying the judicial exception on a generic computing system such as the meta and the base learners (see MPEP 2106.05(f)), as well as a generic field of use via application to processors and instruction memory with computer code instructions (see MPEP 2106.05(h)).  Wherein these recitations of the processors and instruction memory with computer code instructions denote a generic computing environment (see MPEP 2106.05(h)) and do not amount to anything more than the use of the generic computing environment to perform the mental processes (see MPEP 2106.04(a)(2)(III)(C)). 
The limitation describing acquire meta information recites, at a high level of generality, the additional element of an insignificant extra-solution activity related to mere data gathering (see MPEP 2106.05(g)). The limitations describing the use by the base learner relate to mere instructions for applying the judicial exception (see MPEP 2106.05(f)). The system in the preamble and the limitations describing the operative coupling between the two learners and the operative coupling between the memory device and the two learners denote a field of use indicative of a generic computing environment (see MPEP 2106.05(h)). The limitation describing store the slow and fast weights in the weight memory device recites, at a high level of generality, the additional element of an insignificant extra-solution activity related to mere data storage (see MPEP 2106.05(g)). 
Thus, the limitations taken together do not integrate the judicial exception into a practical application. 
Step 2B: the limitations recited above do not amount to significantly more than the judicial exception. As stated above, the limitations describing the meta and the base learners, the use by the base learners, and the implementing via processors and instruction memory with computer code instructions relate to mere instructions to apply the judicial exception on generic computing devices, wherein such application does not amount to significantly more than the judicial exception because the use of generic computing tools to execute the instruction for the judicial exception does not denote anything significantly more than the judicial exception (see MPEP 2106.05(f)). 
Additionally, the system in the preamble and the limitations describing the operative coupling between the two learners, the operative coupling between the weight memory device and the two learners, and the processors and instruction memory with computer code instructions denote a field of use indicative of a generic computing environment (see MPEP 2106.05(h)). Wherein an implementation to a generic computer environment that has been held in FairWarning v. Iatric Sys to be merely indicative of a field of use or tech environment and thus not significantly more than the judicial exception (see MPEP 2106.05(h)).
The limitations reciting acquiring meta information and storing the weights in the weight memory device denote mere data storage and data output indicative of insignificant extra-solution activities (see MPEP 2106.05(g)). Wherein the courts have held that “receiving or transmitting data over a network” or “storing and retrieving information in memory” are known to be well-understood, routine, and conventional activities when recited at a high level of generality (see MPEP 2106.05(d)(II)). 
As such, the limitations do not amount to significantly more than the judicial exception. 

Claim 17
Step 2A, prong 1: the following limitations recite a mathematical concept: “wherein the non-linearity is implemented with a rectified linear unit (ReLU)”. The limitation denotes a mathematical relationship because a rectified linear unit relates to a mathematical relationship or calculation or formula (see MPEP 2106.04(a)(2)(I)). As such, the limitation denotes a mathematical concept.
Step 2A, prong 2: the claim does not recite any additional elements that integrate the judicial exception into a practical application.
Step 2B: the claim does note recite any additional elements that amount to significantly more than the judicial exception.

Claim 18
Step 2A, prong 1: the claim inherits the mental processes from the independent claim. The claim does not recite additional mental processes. 
Step 2A, prong 2: the claim recites the additional element: “wherein the support set of examples and the training set of examples further comprise class labels”. The composition of the support set of examples denotes a field of use (see MPEP 2106.05(h)). 
Step 2B: the limitation recited above does not amount to significantly more than the judicial exception because it merely denotes a field of use related to the support set of examples (see MPEP 2106.05(h)).

Claim 19
Step 2A, prong 1: the following limitations recite mental processes: “evaluate each example instance from the support set of examples and the training set of examples, and generate the one or more fast weights and optimize the one or more slow weights based on the example instance, before an evaluation of a subsequent example instance.”
The above limitations describe mental processes because, under a broadest reasonable interpretation (BRI), they involve: evaluating example instance and generating fast weights and optimizing the slow weights before a subsequent example instance occurs. Thus, the claim recites mental processes because the processes are based on observations, evaluations, judgments, or opinions that are performable in the human mind or with the aid of pencil and paper (see MPEP 2106.04(a)(2)(III)). Indeed, the claim limitations mainly relate to evaluating example instance data and generating and optimizing weighs based on the example instance data before a subsequent example instance occurs. As such, these limitations are conceivably performed mentally or with the aid of paper and pencil and thus are considered as mental processes.
Step 2A, prong 2: the claim recites the additional element: “wherein the meta learner and base learner are configured to”. The limitation recites additional elements related to mere instructions for applying the judicial exception on a generic computing device such as the two learners (see MPEP 2106.05(f)). Wherein the recitation of the two learners denote generic computer components (see MPEP 2106.05(h)).
 Step 2B: the limitation recited above does not amount to significantly more than the judicial exception. As stated above, the imitation describing the two learners relate to mere instructions to apply the judicial exception on generic computing devices, wherein such application does not amount to significantly more than the judicial exception because the use of generic computing tools to execute the instruction for the judicial exception does not denote anything significantly more than the judicial exception (see MPEP 2106.05(f)). The limitation does not amount to anything more than the use of the generic computing environment to perform the mental processes (see MPEP 2106.04(a)(2)(III)(C)). 
Furthermore, specifying the utilization of the two learners is an implementation to a generic computer environment that has been held in FairWarning v. Iatric Sys to be merely indicative of a field of use or tech environment and thus not significantly more than the judicial exception (see MPEP 2106.05(h)).
As such, the limitations do not amount to significantly more than the judicial exception. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 15 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jeng et al. (U.S. Pat. App. Pre-Grant Pub. No. 2004/0024745, hereinafter Jeng) in view of Vinyals et al. “Matching Networks for One Shot Learning” (hereinafter Vinyals), Ba et. al., “Using Fast Weights to Attend to the Recent Past” (hereinafter Ba), and Sow et al. (U.S. Pat. App. Pre-Grant Pub. No. 2014/0279741, hereinafter Sow).

Regarding claim 15, Jeng teaches:
A system …, comprising ([0017]: describing a computing system as shown in Fig. 1.): 
a) a meta learner ([0026]-[0028]: describing a “meta learner”.) 
b) a base learner operatively coupled to the meta learner, the meta learner and base learner … cooperatively ([0026]-[0028]: describing a operative coupling between the meta learner and “base learner” wherein data, e.g. computation results related to various tasks, are cooperatively passed between the two. This is shown in Fig. 5. The meta learner was previously described.)
(i) acquire meta information from a support set of examples ([0024]-[0025]: describing meta processing of domain data sets, i.e. support set of examples, to obtain information.), 
…, and 
… by the base learner ([0028]: “by base learner”.),…; and 
c) … device operatively coupled to the meta learner and the base learner ([0017] and [0028]: describing that the system comprises of computers and servers, which have associated devices. Wherein the system also includes the meta and base learners that are operatively coupled together and with the associated devices in the system ([0026]-[0028]). See also Figs. 1 and 5: showing the system and the two learners. The meta learner was previously described.), the meta learner and base learner being configured to cooperatively ([0017]-[0018] and [0026]-[0028]: describing a operative coupling between the meta learner and base learner. This is shown in Fig. 5.)….

While the cited reference Jeng teaches the above limitations of claim 15, it does not explicitly teach: “for facilitating one-shot learning in neural network” in the preamble. Vinyals teaches: one-shot learning within a neural network (Vinyals Sections 2-2.2 and 5). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning in Jeng to include the one-shot learning in Vinyals. Doing so would enable a “new neural architecture that, by way of its corresponding training regime, is capable of state-of-the-art performance on a variety of one-shot classification tasks” (Vinyals Section 5). 

While the cited references in combination teach the above limitations of claim 15, they do not explicitly teach: “(ii) generate one or more fast weights”; “(iii) optimize one or more slow weights used… based on the one or more fast weights and a training set of examples”; and “store the one or more slow weights and the one or more fast weights”. Ba teaches: 
“(ii) generate one or more fast weights”: describing generation of fast weights in a neural network (Ba Sections 3 and 3.1). 
“(iii) optimize one or more slow weights used … based on the one or more fast weights and a training set of examples”: describing that the slow weights of neural network layers can learn, i.e. are optimized, by stochastic gradient descent (Ba Section 3 and Supplemental Section A). Wherein the learning of the slow weights of the neural network occurs in correlation with fast weights (Ba Section 3) and with training data (Ba Section 4.1 and Supplemental Section A). See also Figs. 1 and 3 showing the slow and fast weights in the layers of the neural network.
“store the one or more slow weights and the one or more fast weights”: describing storing/caching of the slow and fast weights in relation to memory (Ba Sections 3 and 4.2).     
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with one-shot learning in the combined cited references to include the slow and fast weights in Ba. Doing so would enable an improvement in “machine learning by showing that the performance of [neural networks] on a variety of different tasks can be improved by introducing a mechanism that allows each new state of the hidden units to be attracted towards recent hidden states in proportion to their scalar products with the current state” (Ba Section 5). 

While the cited references in combination teach the above limitations of claim 15, they do not explicitly teach: learners “implemented by a processor and an instruction memory with computer code instructions stored thereon, the instruction memory operatively coupled to the processor such that, when executed by the processor, the computer code instructions cause the system”; “a weight memory device”; and “in the weight memory device”. Sow teaches: 
“implemented by a processor and an instruction memory with computer code instructions stored thereon, the instruction memory operatively coupled to the processor such that, when executed by the processor, the computer code instructions cause the system”: describing learners implemented by processor of a computer and code/ instructions via memory that is coupled to the processor for execution of the code/ instructions by the processor (Sow [0028], [0049]-[0056] and [0058]-[0059]). 
“a weight memory” and “in the weight memory device”: describing a process that includes computing and updating weights associated with the learners, wherein such process that includes the weights are stored in various memory devices (Sow [0038]-[0040], [0050] and [0058]). Thus, denoting a weight memory device. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with one-shot learning and fast and slow weights in the combined cited references to include the computer implementation and memory device in Sow. Doing so would enable “a system and method for hierarchical online learning. In exemplary embodiments, end-to-end prediction performance and convergence rate may be improved by utilizing a set of hierarchically organized meta-learners that automatically combine predictions from different heterogeneous learners. Exemplary embodiments may be implemented in a variety of systems in which data may be obtained from various data sources and used to solve a prediction objection” (Sow [0026]).

Regarding claim 17, the rejection of claim 15 is incorporated. Vinyals further teaches:
The system of claim 15, wherein the non-linearity is implemented with a rectified linear unit (ReLU) (Vinyals Section 4.1.1: describing that the neural network model comprises of “a Relu non-linearity”.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with the slow and fast weights and the computer implementation in the combined cited references to include the ReLU in Vinyals. Doing so would enable a “new neural architecture that, by way of its corresponding training regime, is capable of state-of-the-art performance on a variety of one-shot classification tasks” (Vinyals Section 5). 

Regarding claim 18, the rejection of claim 15 is incorporated. Vinyals further teaches:
The system of claim 15, wherein the support set of examples and the training set of examples further comprise class labels (Vinyals Sections 2.1, 4.1.2, and 4.1.3: describing that the “support set of k examples” and “test example” data sets comprise class labels.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with the slow and fast weights and the computer implementation in the combined cited references to include the class labels in Vinyals. Doing so would enable a “training procedure [that] is based on a simple machine learning principle: test and train conditions must match. Thus, to train our network to do rapid learning, we train it by showing only a few examples per class, switching the task from minibatch to minibatch, much like how it will be tested when presented with a few examples of a new task” (Vinyals Section 1). 

Regarding claim 19, the rejection of claim 15 is incorporated. Jeng teaches:
The system of claim 15, wherein the meta learner and base learner are configured to evaluate each example instance … and the training set of examples ([0026] and [0028]: describing that the meta learner and base learner analyze each data set from a plurality of data sets, i.e., sets of examples that comprise training sets of examples.), and ….

While the cited reference Jeng teaches the above limitations of claim 19, it does not explicitly teach: examples “from the support set of examples”. Vinyals further teaches: examples from support set of example data (Vinyals Sections 2.1-2.2 and 4.1 all.  
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with the slow and fast weights and the computer implementation in the combined cited references to include the support sets in Vinyals. Doing so would enable a training procedure wherein “[g]iven a (small) support set S, our model defines a function cS (or classifier) for each S, i.e. a mapping S → cS(.)” (Vinyals Section 2). 

While the cited references in combination teach the above limitations of claim 19, they do not explicitly teach: “generate the one or more fast weights and optimize the one or more slow weights based on the example instance, before an evaluation of a subsequent example instance”. Ba further teaches: 
“generate the one or more fast weights (Ba Sections 3 and 3.1: describing generation of fast weights in the neural network) and optimize the one or more slow weights based on the example instance (Ba Section 3 and Supplemental Section A: describing that the slow weights of neural network layers can learn, i.e. are optimized, by stochastic gradient descent based on current inputs.), before an evaluation of a subsequent example instance (Ba Sections 3 and 3.1: describing an analysis at each iteration in a current hidden state with the current inputs at a “particular time step”. The inputs include a plurality of validation and test examples (Ba Section 4.1). That is, the current inputs are analyzed at each iteration before additional inputs are analyzed.)”.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with one-shot learning, the computer implementation, and the support sets in the combined cited references to include the slow and fast weights in Ba. A motivation to combine the cited references with Ba was previously given

Regarding claim 20, the rejection of claim 15 is incorporated. Jeng teaches:
The system of claim 15, wherein the meta learner and base learner ([0017]-[0018] and [0026]-[0028]: describing the meta learner and base learners.)….  

While the cited reference Jeng teaches the above limitations of claim 20, it does not explicitly teach: “are integrated by a layer augmented multilayer perceptron (MLP)”. Ba further teaches: an integration process that involves augmenting multiple hidden layers of the neural network (Ba Section 4.2), i.e. multilayer perceptron. See also Figs. 1 and 3: showing the augmentation of the hidden layers.).  
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the system for meta learning with one-shot learning and the implementation in the combined cited references to include the MLP in Ba. Doing so would enable an improvement in “machine learning by showing that the performance of [neural networks] on a variety of different tasks can be improved by introducing a mechanism that allows each new state of the hidden units to be attracted towards recent hidden states in proportion to their scalar products with the current state” (Ba Section 5).  
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Hornik et al., “Open-source machine learning: R meets Weka”. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762. The examiner can normally be reached M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2128      

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128