DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-01-14 has been entered.  Applicant’s amendments to the Specification have overcome the objection previously set forth in the Non-Final Office Action mailed 2021-10-14.  The claim status is as follows:
Claims 1-20 remain pending in the application.
Claims 1, 5, 8, 14, and 19 are amended.
Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The amendments have changed the claim scope and necessitated a change in the applied art. 
Claim Interpretation
Examiner points out that in the claims 1-20, the models specifically being BPL models is nonfunctional descriptive material. The type of model does not impact the functionality of the claim and no specific structure is provided that would impart any functionality or change when using BPL models as opposed to any other models. Nevertheless, in an effort to practice 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-7 and 14-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
Claims 1 and 14 recite the following limitation:
“generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs using the set of generation models, to train a second BPL model having a second set of parameters”
It is unclear what exactly is being used to train the second BPL model (plurality of hypotheses”, “plurality of facts”, “plurality of beliefs”, or “the set of generation models”).  If the “plurality of hypotheses” is what is intended (as Examiner suspects based on the arguments in response to Claim 1 on Remarks Page 10), then Examiner suggests the following amendment:  “generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs using the plurality of hypotheses being used to train a second BPL model having a second set of parameters”. 
Dependent claims 2-7 and 15-20 are rejected because they inherit the deficiencies of claims 1 and 14.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6, 14-16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et. al. (“DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning”; hereinafter “Xiong”) in view of Aslan et. al. (US 2017/0132528 A1; hereinafter “Aslan”), Walters et. al. (US 2020/0012584 A1, hereinafter “Walters”), and Lake et. al. (“Human-level concept learning through probabilistic program induction”; hereinafter “Lake”).
As per Claim 1, Xiong teaches a method, comprising: identifying a plurality of facts based on a knowledge graph data structure that represents a first plurality of data records and a second plurality of data records, each data record from the first plurality of data records being associated with an entity from a plurality of entities, the second plurality of data records indicating a plurality of relationships associated with the first plurality of data records (Xiong, Figure 1, discloses:

    PNG
    media_image1.png
    670
    1193
    media_image1.png
    Greyscale

Here, Xiong discloses a knowledge graph data structure, which represents a first plurality of data records each associated with an entity (in the white ovals), and a second plurality of data records indicating relationships associated with the entities (in the gray ovals).  The mapping could also work, conversely, as the first plurality of data records being the gray ovals, and the second plurality being the white ovals.  The facts, as stated in the Caption (“existing relation links”) are represented by the dotted arrows)
	inferring a plurality of beliefs from the plurality of facts using [the set of inference models] a set of inference criteria, to train a first [BPL] model having a first set of parameters (Xiong, Figure 1 Caption discloses:  “The dotted arrows (partially) show the existing relation links in the KG and the bold arrows show the reasoning paths found by the RL agent”.  Here, Xiong discloses inferring a plurality of beliefs (“bold arrows”), from the plurality of facts (“dotted arrows”).  This is done by using a set of inference criteria, as Xiong describes the bold arrows as “reasoning paths”, and the “reasoning” is performing an inference, which must be based on some criteria.  Xiong, Section 3.1 Para 2, discloses:  “The second part of the system, the RL agent, is represented as a policy network 
    PNG
    media_image2.png
    17
    125
    media_image2.png
    Greyscale
 which maps the state vector to a stochastic policy. The neural network parameters 
    PNG
    media_image3.png
    18
    18
    media_image3.png
    Greyscale
are updated using stochastic gradient descent.”  Here, Xiong discloses training (“stochastic gradient descent”) a learning model (“RL agent”) having a first set of parameters (“neural network parameters”)).
*Using the output of one model to train another model will be taught by Aslan below. 
*BPL will be taught by Lake below
	generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs using [the set of generation models] a set of generation criteria, [to train a second BPL]model having a second set of parameters] (Recall above that Xiong Figure 1 discloses generating “beliefs” from “facts”.
	Note that Xiong Figure 1 also discloses that the “state” of the graph is used to determine the next reason step, and this state is updated.  Therefore, the process is cumulative, and after Xiong generates “beliefs” a second step can generate “hypotheses”.  Note that the “Query Node” in the example is “Band of Brothers” and the “Reason Task” is “tvProgramLanguage”.  Note that the path from “Band of Brothers” to “English” is at minimum 2 steps.  Thus, “beliefs” (bold arrows to “United States”, “Neal McDonough”, and “Tom Hanks”) were generated in the first step based on the “facts” (dotted arrows).  Subsequently, the “hypotheses” are the sets of bold arrows ultimately linking “Band of Brothers” to “English”.  Also recall above that Xiong discloses that this is part of training the model with Reinforcement Learning, and this is based on a set of criteria, that is used to generate the hypotheses.)
*Using the output of one model to train a second model will be taught by Aslan below. 
*BPL will be taught by Lake below
generating at least one hypothesis in response to input data associated with an entity [using the first BPL model and the second BPL model] (Recall above that Xiong Figure 1 discloses generating the hypothesis that the entity “Band of Brothers” is in “English”.)
*Two models will be taught by Aslan below
*BPL will be taught by Lake below
updating the first set of parameters [and the second set of parameters] based on the at least one hypothesis (Recall above that Xiong discloses generating a hypothesis.  Xiong, Section 3.1 Para 2, discloses:  “The second part of the system, the RL agent, is represented as a policy network 
    PNG
    media_image2.png
    17
    125
    media_image2.png
    Greyscale
 which maps the state vector to a stochastic policy. The neural network parameters 
    PNG
    media_image3.png
    18
    18
    media_image3.png
    Greyscale
are updated using stochastic gradient descent.”  Here, Xiong discloses training (“stochastic gradient descent”) a learning model (“RL agent”) having a first set of parameters (“neural network parameters”)).
*Two models (second set of parameters) will be taught by Aslan below
However, Xiong does not teach searching a concept library including a set of conceptual models encoded as Bayesian Program Learning (BPL) models for a set of inference models to satisfy a set of inference criteria based on a set of boundary conditions, the set of inference models included in the set of conceptual models and the BPL models including a set of dependencies among the set of conceptual models; searching the concept library for a set of generation models to satisfy a set of generation criteria based on the set of boundary conditions, the set of generation models included in the set of conceptual models; inferring a plurality of beliefs from the plurality of facts using the set of inference models, to train a first BPL model having a first set of parameters; generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs using the set of generation models, to train a second BPL model having a second set of parameters; updating the first set of parameters and the second set of parameters based on the at least one hypothesis
	Aslan teaches inferring a plurality of beliefs from the plurality of facts using the set of inference models, to train a first [BPL] model having a first set of parameters; (Recall above that Xiong discloses inferring a plurality of beliefs from the plurality of facts and generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs.  Aslan discloses two models, wherein the output of the first model is used to train the second model.   Aslan, Para [0027], discloses:  “Instead, with the joint training technique of FIG. 1, the second model's 102 training can be influenced by the first model 100 (e.g., by the second model 102 having access to information about the outputs of the first model 100 based on the first model's 100 processing of the training data 104 as input) while the first model 100 is training, and/or prior to the first model 100 completing its training.”  Thus, when combined with Xiong, the output beliefs of an inference model from the set of inference models (“first model”) are used to train a “first model having a first set of parameters” (“second model”)).
*BPL will be taught by Lake below
generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs using the set of generation models, to train a second [BPL] model having a second set of (Recall above that Xiong discloses generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs.  Aslan discloses two models, wherein the output of the first model is used to train the second model.   Aslan, Para [0027], discloses:  “Instead, with the joint training technique of FIG. 1, the second model's 102 training can be influenced by the first model 100 (e.g., by the second model 102 having access to information about the outputs of the first model 100 based on the first model's 100 processing of the training data 104 as input) while the first model 100 is training, and/or prior to the first model 100 completing its training.”  Thus, when combined with Xiong, the output beliefs of a generation model from the set of generation models (“first model”) are used to train a “second model having a second set of parameters” (“second model”).  
Note that Aslan’s “first model” and “second model” can also be applied across these two limitations, wherein Aslan’s “first model” is the “first model having a first set of parameters” and Aslan’s “second model” is the “second model having a second set of parameters”).
updating the first set of parameters and the second set of parameters based on the at least one hypothesis (As shown above, Aslan’s “first model” and “second model” can also be applied across the previous two limitations, wherein Aslan’s “first model” is the “first model having a first set of parameters” and Aslan’s “second model” is the “second model having a second set of parameters.”  Besides disclosing that output can go form the “first model” to the “second model”, Aslan also discloses that the output of the second model (“at least one hypothesis”) can go back to the first model during a “joint training” process.   Aslan discloses this in Para [0031]:  “So far, two possible directions for transferring knowledge (or passing information) between the multiple models 100 and 102 during joint training have been discussed with reference to paths 106 and 108 of FIG. 1. Additionally, knowledge can be bi-directionally transferred between the first model 100 and the second model 102 during joint training, as depicted visually in FIG. 1 by path 110 between the first model 100 and the second model 102. In other words, data can be processed by each model 100 and 102, and the objective function used for joint training of the models 100 and 102 can determine the degree to which the models 100 and 102 agree with each other, and can “push” the models toward agreement.”)
Xiong and Aslan are analogous are because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong with the two models with joint training of Aslan.  One of ordinary skill in the art would be motivated to do so in order to achieve better performance with more accurate results (Aslan, Para [0008]:  “The joint model training techniques described herein provide greater flexibility as compared to current model training methods due to the ability of at least one model to influence the training of at least one other model during the joint training process. In this sense, a machine learning model is able to see what another machine learning model is learning, as the other machine learning model is learning. Furthermore, multiple machine learning models can be trained in a collaborative fashion where visibility across models is enabled, which can lead to one machine learning model selecting a learning function that is best suited for another machine learning model. Machine learning models that are trained using the techniques described herein can 
However, the combination of Xiong and Aslan thus far fails to teach searching a concept library including a set of conceptual models encoded as Bayesian Program Learning (BPL) models for a set of inference models to satisfy a set of inference criteria based on a set of boundary conditions, the set of inference models included in the set of conceptual models and the BPL models including a set of dependencies among the set of conceptual models; searching the concept library for a set of generation models to satisfy a set of generation criteria based on the set of boundary conditions, the set of generation models included in the set of conceptual models;
	Walters teaches searching a concept library including a set of conceptual models [encoded as Bayesian Program Learning (BPL) models] for a set of inference models to satisfy a set of inference criteria based on a set of boundary conditions, the set of inference models included in the set of conceptual models and the [BPL] models including a set of dependencies among the set of conceptual models; (Recall above that Xiong discloses inferring a belief based on a set of inference criteria.  Walters, Para [0005], discloses:  “There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application.”  Here, Walters discloses a concept library (“model library”) that has a set of models (“conceptual models” and “inference models”, which would satisfy a “set of inference criteria”). This search is based on a set of boundary conditions (“model with the desired degree of accuracy can be selected”).  All machine learning models include a set of dependencies, upon which they are trained.  Walters also describes searching the model library.  Walters, Para [0188], describes the library as indexed:  “At step 1814, model optimizer 107 stores the optimized model. In some embodiments, the optimized model is stored in a model library. For example, model optimizer 107 may store the model in model storage 109. Storing the model at step 1814 may comprise updating an index of models.”  Walters, Para [0180], describes the searching of the model library:  “At step 1804, an input model is received by model optimizer 107. The input model may be one of a machine learning model or a statistical model, consistent with disclosed embodiments. In some embodiments, the input model is a seed model received at step 1802 via interface 113. In some embodiments, receiving the input model at step 1804 includes generating or retrieving a model based on at least one of the desired outcome, a model characteristic, or a model index. In some embodiments, receiving the input model at step 1804 includes retrieving the input model from a model storage (e.g., model storage 109). The model characteristic may include one of a model type, a data schema, a data statistic, a training dataset type, a model task, a hyperparameter, a training dataset, or an outcome associated with the model. For example, step 1804 may include selecting the candidate model from among a plurality of candidate models in model storage 109 based on a determination that the desired outcome corresponds to an outcome associated with the selected candidate model.”)
searching the concept library for a set of generation models to satisfy a set of generation criteria based on the set of boundary conditions, the set of generation models included in the set of conceptual models (Walters, as shown above, discloses a model library, and searching that library for a model based on a set of boundary conditions.  Recall above also that Xiong discloses “generating a plurality of hypotheses”, and thus the set of conceptual models are “generation models” when combined with Xiong.)
Walters and the combination of Xiong and Aslan are analogous art because they are both in the field of endeavor of machine learning.
 It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong and Aslan with the model library of Walters.  One of ordinary skill in the art would be motivated to do so in order to use the model best suited for a given situation, and to achieve desired accuracy (Walters, Para [0005]:  “There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application. Furthermore, development of high-performance models can be enhanced through model re-use. For example, a user may develop a first model for a first application involving a dataset. Latent information and relationships present in the dataset may be embodied in the 
	However, the combination of Xiong, Aslan, and Walters does not teach Bayesian Program Learning (BPL) model.
	Lake teaches Bayesian Program Learning (BPL) model.  (Lake, Pg 1333 First Full Paragraph, begins:  “This paper introduces the Bayesian program learning (BPL) framework, capable of learning a large class of visual concepts from just a single example and generalizing in ways that are mostly indistinguishable from people.”)
	Lake and the combination of Xiong, Aslan, and Walters are analogous art because they are both in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong, Aslan, and Walters with the BPL of Lake.  The combination would result in a knowledge graph reasoning model that is capable of “learning to learn” and making inferences based on data, as stated in Lake, Discussion:  

	As per Claim 2, the combination of Xiong, Aslan, Walters, and Lake teaches the method of claim 1.  Xiong teaches wherein the first plurality of data records and the second plurality of data records include at least one of image data, video data, audio data, textual data, or time series data.  (Xiong, Figure 1, discloses a knowledge graph where the first and second plurality of data records comprise textual data, for example “United States” and “Nationality”, respectively.)

	As per Claim 3, the combination of Xiong, Aslan, Walters, and Lake the method of claim 1.  Xiong teaches wherein the first plurality of data records and the second plurality of data records are received from at least one of a database, a file system, or an application.  (Xiong, Section 4.1, discloses:  “Table 1 shows the statistics of the two datasets we conduct our experiments on. Both of them are subsets of larger datasets. The triples in FB15K-237 (Toutanova et al., 2015) are sampled from FB15K (Bordes et al., 2013) with redundant relations removed.”  Here, Xiong discloses databases (“FB15K-237” and “FB15K”)).

	As per Claim 6, the combination of Xiong, Aslan, Walters, and Lake teaches the method of claim 1 and the second BPL model (see Rejection to Claim 1).  Xiong teaches wherein the second BPL model is at least one of a Bayesian inference model or a reinforcement learning model.  (Xiong, Section 2 Last Paragraph, discloses:  “NSM learns to compose programs that can find answers to natural language questions, while our RL model tries to add new facts to knowledge graph (KG) by reasoning on existing KG triples.”  Here, Xiong discloses a reinforcement learning model (“RL model”)).

As per Claim 14, Xiong teaches a non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to (Xiong, Acknowledgments, discloses:  “We gratefully acknowledge the support of NVIDIA Corporation with the donation of one Titan X Pascal GPU used for this research.”  Here, Xiong discloses the use of a processor, which also implies the use of a non-transitory processor-readable medium storing code).
define a plurality of facts based on a knowledge graph data structure that represents a first plurality of data records and a second plurality of data records, the first plurality of data records associated with a plurality of entities, the second plurality of data records associated with a plurality of relationships, the plurality of relationships associated with the first plurality of data records (Xiong, Figure 1, discloses:

    PNG
    media_image1.png
    670
    1193
    media_image1.png
    Greyscale

Here, Xiong discloses a knowledge graph data structure, which represents a first plurality of data records each associated with an entity (in the white ovals), and a second plurality of data records indicating relationships associated with the entities (in the gray ovals).  The mapping could also work, conversely, as the first plurality of data records being the gray ovals, and the second plurality being the white ovals.  The facts, as stated in the Caption (“existing relation links”) are represented by the dotted arrows))
infer a plurality of beliefs from the plurality of facts using [the set of inference models] a set of inference criteria, to train a first [BPL] model having a first set of parameters (Xiong, Figure 1 Caption discloses:  “The dotted arrows (partially) show the existing relation links in the KG and the bold arrows show the reasoning paths found by the RL agent”.  Here, Xiong discloses inferring a plurality of beliefs (“bold arrows”), from the plurality of facts (“dotted arrows”).  This is done by using a set of inference criteria, as Xiong describes the bold arrows as “reasoning paths”, and the “reasoning” is performing an inference, which must be based on some criteria.  Xiong, Section 3.1 Para 2, discloses:  “The second part of the system, the RL agent, is represented as a policy network 
    PNG
    media_image2.png
    17
    125
    media_image2.png
    Greyscale
 which maps the state vector to a stochastic policy. The neural network parameters 
    PNG
    media_image3.png
    18
    18
    media_image3.png
    Greyscale
are updated using stochastic gradient descent.”  Here, Xiong discloses training (“stochastic gradient descent”) a learning model (“RL agent”) having a first set of parameters (“neural network parameters”)).
*Using the output of one model to train another model will be taught by Aslan below. 
*BPL will be taught by Lake below
	generate a plurality of hypotheses from the plurality of facts and the plurality of beliefs using [the set of generation models] a set of generation criteria, [to train a second BPL]model having a second set of parameters] (Recall above that Xiong Figure 1 discloses generating “beliefs” from “facts”.
	Note that Xiong Figure 1 also discloses that the “state” of the graph is used to determine the next reason step, and this state is updated.  Therefore, the process is cumulative, and after Xiong generates “beliefs” a second step can generate “hypotheses”.  Note that the “Query Node” in the example is “Band of Brothers” and the “Reason Task” is “tvProgramLanguage”.  Note that the path from “Band of Brothers” to “English” is at minimum 2 steps.  Thus, “beliefs” (bold arrows to “United States”, “Neal McDonough”, and “Tom Hanks”) were generated in the first step based on the “facts” (dotted arrows).  Subsequently, the “hypotheses” are the sets of bold arrows ultimately linking “Band of Brothers” to “English”.  Also recall above that Xiong discloses that this is part of training the model with Reinforcement Learning, and this is based on a set of criteria, that is used to generate the hypotheses.)
*Using the output of one model to train a second model will be taught by Aslan below. 
*BPL will be taught by Lake below
detect at least one of a new fact, a new belief, or a new hypothesis (Recall above that Xiong Figure 1 discloses generating new beliefs and hypotheses.  For example, in Figure 1, Xiong discloses “Band of Brothers” casts “Neal McDonough”.)
*Two models will be taught by Aslan below
*BPL will be taught by Lake below
improve at least one of the first BPL model or the second BPL model based on at least one of the new fact, the new belief, or the new hypothesis. (Recall above that Xiong discloses generating a hypothesis, as part of an iterative process of generating new beliefs and hypotheses.  Xiong, Section 3.1 Para 2, discloses:  “The second part of the system, the RL agent, is represented as a policy network 
    PNG
    media_image2.png
    17
    125
    media_image2.png
    Greyscale
 which maps the state vector to a stochastic policy. The neural network parameters 
    PNG
    media_image3.png
    18
    18
    media_image3.png
    Greyscale
are updated using stochastic gradient descent.”  Here, Xiong discloses training (“stochastic gradient descent”) a learning model (“RL agent”) having a first set of parameters (“neural network parameters”), and thus improves at least one of the first or second models.  Xiong improves the model based on the new belief above, as “Band of Brothers” casts “Neal McDonough” is further improved to generate the hypothesis the “Band of Brothers” is in “English”).
However, Xiong does not teach search a concept library including a set of conceptual models encoded as Bayesian Program Learning (BPL) models for a set of inference models to satisfy a set of inference criteria based on a set of boundary conditions, the set of inference models included in the set of conceptual models and the BPL models including a set of dependencies among the set of conceptual models; search the concept library for a set of generation models to satisfy a set of generation criteria based on the set of boundary conditions, the set of generation models included in the set of conceptual models; infer a plurality of beliefs from the plurality of facts using the set of inference models, to train a first BPL model having a first set of parameters; generate a plurality of hypotheses from the plurality of facts and the plurality of beliefs using the set of generation models, to train a second BPL model having a second set of parameters.
Aslan teaches infer a plurality of beliefs from the plurality of facts using the set of inference models, to train a first [BPL] model having a first set of parameters; (Recall above that Xiong discloses inferring a plurality of beliefs from the plurality of facts and generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs.  Aslan discloses two models, wherein the output of the first model is used to train the second model.   Aslan, Para [0027], discloses:  “Instead, with the joint training technique of FIG. 1, the second model's 102 training can be influenced by the first model 100 (e.g., by the second model 102 having access to information about the outputs of the first model 100 based on the first model's 100 processing of the training data 104 as input) while the first model 100 is training, and/or prior to the first model 100 completing its training.”  Thus, when combined with Xiong, the output beliefs of an inference model from the set of inference models (“first model”) are used to train a “first model having a first set of parameters” (“second model”)).
*BPL will be taught by Lake below
generate a plurality of hypotheses from the plurality of facts and the plurality of beliefs using the set of generation models, to train a second [BPL] model having a second set of parameters (Recall above that Xiong discloses generating a plurality of hypotheses from the plurality of facts and the plurality of beliefs.  Aslan discloses two models, wherein the output of the first model is used to train the second model.   Aslan, Para [0027], discloses:  “Instead, with the joint training technique of FIG. 1, the second model's 102 training can be influenced by the first model 100 (e.g., by the second model 102 having access to information about the outputs of the first model 100 based on the first model's 100 processing of the training data 104 as input) while the first model 100 is training, and/or prior to the first model 100 completing its training.”  Thus, when combined with Xiong, the output beliefs of a generation model from the set of generation models (“first model”) are used to train a “second model having a second set of parameters” (“second model”).  
Note that Aslan’s “first model” and “second model” can also be applied across these two limitations, wherein Aslan’s “first model” is the “first model having a first set of parameters” and Aslan’s “second model” is the “second model having a second set of parameters”).
Xiong and Aslan are analogous are because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong with the two models with joint training of Aslan.  One of ordinary skill in the art would be motivated to do so in order to achieve better performance with more accurate results (Aslan, Para [0008]:  “The joint model training techniques described herein provide greater flexibility as compared to current model training methods due to the ability of at least one model to influence the training of at least one other model during the joint training process. In this sense, a machine learning model is able to see what another machine learning model is learning, as the other machine learning model is learning. Furthermore, multiple machine learning models can be trained in a collaborative fashion where visibility across models is enabled, which can lead to one machine learning model selecting a learning function that is best suited for another machine learning model. Machine learning models that are trained using the techniques described herein can perform better (in terms of the accuracy of the model output) than conventionally-trained machine learning models in some scenarios. Furthermore, the machine learning models that are trained with the techniques and systems described herein can be deployed or implemented in a more versatile fashion.”)
However, the combination of Xiong and Aslan thus far fails to teach search a concept library including a set of conceptual models encoded as Bayesian Program Learning (BPL) models for a set of inference models to satisfy a set of inference criteria based on a set of boundary conditions, the set of inference models included in the set of conceptual models and the BPL models including a set of dependencies among the set of conceptual models; search the concept library for a set of generation models to satisfy a set of generation criteria based on the set of boundary conditions, the set of generation models included in the set of conceptual models;
	Walters teaches search a concept library including a set of conceptual models [encoded as Bayesian Program Learning (BPL) models] for a set of inference models to satisfy a set of inference criteria based on a set of boundary conditions, the set of inference models included in the set of conceptual models and the [BPL] models including a set of dependencies among the set of conceptual models; (Recall above that Xiong discloses inferring a belief based on a set of inference criteria.  Walters, Para [0005], discloses:  “There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application.”  Here, Walters discloses a concept library (“model library”) that has a set of models (“conceptual models” and “inference models”, which would satisfy a “set of inference criteria”). This search is based on a set of boundary conditions (“model with the desired degree of accuracy can be selected”).  All machine learning models include a set of dependencies, upon which they are trained.  Walters also describes searching the model library.  Walters, Para [0188], describes the library as indexed:  “At step 1814, model optimizer 107 stores the optimized model. In some embodiments, the optimized model is stored in a model library. For example, model optimizer 107 may store the model in model storage 109. Storing the model at step 1814 may comprise updating an index of models.”  Walters, Para [0180], describes the searching of the model library:  “At step 1804, an input model is received by model optimizer 107. The input model may be one of a machine learning model or a statistical model, consistent with disclosed embodiments. In some embodiments, the input model is a seed model received at step 1802 via interface 113. In some embodiments, receiving the input model at step 1804 includes generating or retrieving a model based on at least one of the desired outcome, a model characteristic, or a model index. In some embodiments, receiving the input model at step 1804 includes retrieving the input model from a model storage (e.g., model storage 109). The model characteristic may include one of a model type, a data schema, a data statistic, a training dataset type, a model task, a hyperparameter, a training dataset, or an outcome associated with the model. For example, step 1804 may include selecting the candidate model from among a plurality of candidate models in model storage 109 based on a determination that the desired outcome corresponds to an outcome associated with the selected candidate model.”)
search the concept library for a set of generation models to satisfy a set of generation criteria based on the set of boundary conditions, the set of generation models included in the set of conceptual models (Walters, as shown above, discloses a model library, and searching that library for a model based on a set of boundary conditions.  Recall above also that Xiong discloses “generating a plurality of hypotheses”, and thus the set of conceptual models are “generation models” when combined with Xiong.)
Walters and the combination of Xiong and Aslan are analogous art because they are both in the field of endeavor of machine learning.
 It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong and Aslan with the model library of Walters.  One of ordinary skill in the art would be motivated to do so in order to use the model best suited for a given situation, and to achieve desired accuracy (Walters, Para [0005]:  “There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application. Furthermore, development of high-performance models can be enhanced through model re-use. For example, a user may develop a first model for a first application involving a dataset. Latent information and relationships present in the dataset may be embodied in the first model. The first model may therefore be a useful starting point for developing models for other applications involving the same dataset. For example, a model trained to identify animals in images may be useful for identifying parts of animals in the same or similar images (e.g., labeling the paws of a rat in video footage of an animal psychology experiment). However, manual hyperparameter tuning can be tedious and difficult. In addition, hyperparameter tuning 
	However, the combination of Xiong, Aslan, and Walters does not teach Bayesian Program Learning (BPL) model.
	Lake teaches Bayesian Program Learning (BPL) model.  (Lake, Pg 1333 First Full Paragraph, begins:  “This paper introduces the Bayesian program learning (BPL) framework, capable of learning a large class of visual concepts from just a single example and generalizing in ways that are mostly indistinguishable from people.”)
	Lake and the combination of Xiong, Aslan, and Walters are analogous art because they are both in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong, Aslan, and Walters with the BPL of Lake.  The combination would result in a knowledge graph reasoning model that is capable of “learning to learn” and making inferences based on data, as stated in Lake, Discussion:  “Despite a changing artificial intelligence landscape, people remain far better than machines at learning new concepts: They require fewer examples and use their concepts in richer ways. Our work suggests that the principles of compositionality, causality, and learning to learn will be critical in building machines that narrow this gap.”  One would be motivated to make this combination in order to save time and resources by eliminating the need to do training on 

	As per Claim 15, the combination of Xiong, Aslan, Walters, and Lake teaches the non-transitory processor-readable medium of claim 14.  Xiong teaches wherein the first plurality of data records and the second plurality of data records include at least one of image data, video data, audio data, textual data, or time series data.  (Xiong, Figure 1, discloses a knowledge graph where the first and second plurality of data records comprise textual data, for example “United States” and “Nationality”, respectively.)

	As per Claim 16, the combination of Xiong, Aslan, Walters, and Lake teaches the non-transitory processor-readable medium of claim 14.  Xiong teaches wherein the first plurality of (Xiong, Section 4.1, discloses:  “Table 1 shows the statistics of the two datasets we conduct our experiments on. Both of them are subsets of larger datasets. The triples in FB15K-237 (Toutanova et al., 2015) are sampled from FB15K (Bordes et al., 2013) with redundant relations removed.”  Here, Xiong discloses databases (“FB15K-237” and “FB15K”)).

	As per Claim 20, the combination of Xiong, Aslan, Walters, and Lake teaches the non-transitory processor-readable medium of claim 14 and the second BPL model (see Rejection to Claim 1).  Xiong teaches wherein the second BPL model is at least one of a Bayesian inference model or a reinforcement learning model.  (Xiong, Section 2 Last Paragraph, discloses:  “NSM learns to compose programs that can find answers to natural language questions, while our RL model tries to add new facts to knowledge graph (KG) by reasoning on existing KG triples.”  Here, Xiong discloses a reinforcement learning model (“RL model”)).

Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Xiong, Aslan, Walters, and Lake in view of Salimans et. al. (“Markov Chain Monte Carlo and Variational Inference: Bridging the Gap”; hereinafter “Salimans”).
As per Claim 4, the combination of Xiong, Aslan, Walters, and Lake teaches the method of claim 1 as well as first BPL model and second BPL model.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach improving the first BPL model  using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference.
Salimans teaches improving [the first BPL model and the second BPL model] using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference algorithm. (Salimans, Abstract, discloses:  “This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation.”)
Salimans and the combination of Xiong, Aslan, Walters, and Lake are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the Bayesian learning of the combination of Xiong, Aslan, Walters, and Lake with the synthesis of variational inference and Monte Carlo of Salimans.  One would be motivated to do so in order to achieve greater speed and/or accuracy (Salimans, Abstract:  “By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy.”)

As per Claim 17, the combination of Xiong, Aslan, Walters, and Lake teaches the non-transitory processor-readable medium of claim 14 as well as first BPL model and second BPL model.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach  the first BPL model and the second BPL model using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference.
Salimans teaches improving [the first BPL model and the second BPL model] using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference algorithm. (Salimans, Abstract, discloses:  “This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation.”)
Salimans and the combination of Xiong, Aslan, Walters, and Lake are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the Bayesian learning of the combination of Xiong, Aslan, Walters, and Lake with the synthesis of variational inference and Monte Carlo of Salimans.  One would be motivated to do so in order to achieve greater speed and/or accuracy (Salimans, Abstract:  “By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy.”)

Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Xiong, Aslan, Walters, and Lake in view of Knijnik et. al. (US 2018/0308159 A1; hereinafter “Knijnik”).
As per Claim 5, the combination of Xiong, Aslan, Walters, and Lake teaches the method of claim 1 as well as the set of inference criteria and the set of generation criteria.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach wherein the set of boundary conditions includes at least one of a dependency on a target belief, a derived boundary condition, or a restriction based on types of facts. 
Knijnik teaches wherein the set of boundary conditions includes at least one of a dependency on a target belief, a derived boundary condition, or a restriction based on types of facts.  (Knijnik, Para [0060], discloses:  “As opposed to the traditional credit scores available in the market, the present invention may automatically suggest an amount of money that a lender can lend to a relevant company according to some set of boundary conditions or established limits. This suggestion may be based on: (1) the predicted sales volume for the company, obtained as described above; (2) parameters inputted by the lender as an upper bound limit for the lending amount based on predicted sales volumes, financial history of the company, and the like. In embodiments of the present invention, some of the boundary conditions are: (i) 5% of the predicted annual sales volume; (ii) 17.5% of the predicted sales volume for the next 90 days; (iii) 18.5% of the predicted sales volume for the next 180 days, etc. In other embodiments of the present invention, the limit is set as a reduction factor based on the financial history between the borrower and the lender, e.g., 35% of the maximum historical lending amount. Once these factors are inputted for a given company, the present invention may present, based on the analysis presented above, the suggested maximum borrowing amount for that company, significantly reducing the effort expended in determining the amount to be lent, and thereby increasing the efficiency and security of the lender's operations.”  Here, Knijnik discloses inference criteria (criteria to “suggest an amount of money”), wherein the inference criteria includes a restriction based on types of facts (“parameters inputted by the lender as an upper bound limit for the lending amount based on predicted sales volumes”).  This is similar to Instant Specification [0043], where “constraints in amount of loan” is given as an example of a boundary condition.)
Knijnik and the combination of Xiong, Aslan, Walters, and Lake are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of the combination of Xiong, Aslan, Walters, and Lake with the predefined boundary condition of Knijnik.  The combination would result in one being able to focus the results of the knowledge graph reasoning on a precise problem according to a user’s specific needs, and one would be motivated to do so in order to reduce the effort and increase the efficiency of the specific task  (Knijnik [0060]:  “Once these factors are inputted for a given company, the present invention may present, based on the analysis presented above, the suggested maximum borrowing amount for that company, significantly reducing the effort expended in determining the amount to be lent, and thereby increasing the efficiency and security of the lender's operations.”)

As per Claim 19, the combination of Xiong, Aslan, Walters, and Lake teaches the non-transitory processor-readable medium of claim 14 as well as the set of inference criteria and the set of generation criteria.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach wherein the set of boundary conditions includes at least one of a dependency on a target belief, a derived boundary condition, or a restriction based on types of facts. 
Knijnik teaches wherein the set of boundary conditions includes at least one of a dependency on a target belief, a derived boundary condition, or a restriction based on types of facts.  (Knijnik, Para [0060], discloses:  “As opposed to the traditional credit scores available in the market, the present invention may automatically suggest an amount of money that a lender can lend to a relevant company according to some set of boundary conditions or established limits. This suggestion may be based on: (1) the predicted sales volume for the company, obtained as described above; (2) parameters inputted by the lender as an upper bound limit for the lending amount based on predicted sales volumes, financial history of the company, and the like. In embodiments of the present invention, some of the boundary conditions are: (i) 5% of the predicted annual sales volume; (ii) 17.5% of the predicted sales volume for the next 90 days; (iii) 18.5% of the predicted sales volume for the next 180 days, etc. In other embodiments of the present invention, the limit is set as a reduction factor based on the financial history between the borrower and the lender, e.g., 35% of the maximum historical lending amount. Once these factors are inputted for a given company, the present invention may present, based on the analysis presented above, the suggested maximum borrowing amount for that company, significantly reducing the effort expended in determining the amount to be lent, and thereby increasing the efficiency and security of the lender's operations.”  Here, Knijnik discloses inference criteria (criteria to “suggest an amount of money”), wherein the inference criteria includes a restriction based on types of facts (“parameters inputted by the lender as an upper bound limit for the lending amount based on predicted sales volumes”).  This is similar to Instant Specification [0043], where “constraints in amount of loan” is given as an example of a boundary condition.)
Knijnik and the combination of Xiong, Aslan, Walters, and Lake are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of the combination of Xiong, Aslan, Walters, and Lake with the predefined boundary condition of Knijnik.  The combination would result in one being able to focus the results of the knowledge graph reasoning on a precise problem according to a user’s specific needs, and one would be motivated to do so in order to reduce the effort and increase the efficiency of the specific task  (Knijnik [0060]:  “Once these factors are inputted for a given company, the present invention may present, based on the analysis presented above, the suggested maximum borrowing amount for that company, significantly reducing the effort expended in determining the amount to be lent, and thereby increasing the efficiency and security of the lender's operations.”)

Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Xiong, Aslan, Walters, and Lake in view of Knox et. al. (“TAMER: Training an Agent Manually via Evaluative Reinforcement”; hereinafter “Knox”).
As per Claim 7, the combination of Xiong, Aslan, Walters, and Lake teaches the method of claim 1 as well as at least one hypothesis from the plurality of hypotheses and first BPL model and second BPL model.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach further comprising: receiving feedback on at least one hypothesis from the 
Knox teaches further comprising: receiving feedback [on at least one hypothesis from the plurality of hypotheses]; and improving [at least one of the first BPL model or the second BPL model] based on the feedback (Knox, Introduction Para 3, discloses:  “In this paper, we develop a method by which the human trainer can merely give positive and negative reinforcement signals (called “reward” in the learning agent community) to the agent. It only requires that a person can observe the agent’s behavior, judge its quality, and send a feedback signal that can be mapped to a scalar value (e.g. by button press or verbal feedback of “good” and “bad””).  Here, Knox discloses receiving feedback in a reinforcement learning model.  Knox, Abstract, concludes:  “Leveraging the human trainers’ feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.”  Here, Knox discloses improving the model based on the feedback, as the model learns more quickly.)
Knox and the combination of Xiong, Aslan, Walters, and Lake are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the reinforcement learning of the combination of Xiong, Aslan, Walters, and Lake with the feedback of Knox.  One would be motivated to do so in order to save time by having the model learn more quickly. (Knox, Abstract:  “Leveraging the human trainers’ feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.”)

As per Claim 18, the combination of Xiong and Lake teaches the non-transitory processor-readable medium of claim 14 as well as at least one hypothesis from the plurality of hypotheses and first BPL model and second BPL model.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach further comprising: receiving feedback on at least one hypothesis from the plurality of hypotheses; and improving at least one of the first BPL model or the second BPL model based on the feedback.
Knox teaches further comprising: receiving feedback [on at least one hypothesis from the plurality of hypotheses]; and improving [at least one of the first BPL model or the second BPL model] based on the feedback (Knox, Introduction Para 3, discloses:  “In this paper, we develop a method by which the human trainer can merely give positive and negative reinforcement signals (called “reward” in the learning agent community) to the agent. It only requires that a person can observe the agent’s behavior, judge its quality, and send a feedback signal that can be mapped to a scalar value (e.g. by button press or verbal feedback of “good” and “bad””).  Here, Knox discloses receiving feedback in a reinforcement learning model.  Knox, Abstract, concludes:  “Leveraging the human trainers’ feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.”  Here, Knox discloses improving the model based on the feedback, as the model learns more quickly.)
Knox and the combination of Xiong, Aslan, Walters, and Lake are analogous art because they are both in the field of machine learning.  


Claims 8-11 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Walters, Lake, Knox, and Elfwing et. al. (“Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning”; hereinafter “Elfwing”).
As per Claim 8, Xiong teaches an apparatus, comprising: a memory; and a processor operatively coupled to the memory (Xiong, Acknowledgments, discloses:  “We gratefully acknowledge the support of NVIDIA Corporation with the donation of one Titan X Pascal GPU used for this research.”  Here, Xiong discloses the use of a processor, which also implies the use of a memory operatively coupled with the processor).
the processor configured to receive a knowledge graph data structure, the knowledge graph data structure including at least an association of a first entity record from an entity dataset with a second entity record from the entity dataset (Xiong, Figure 1, discloses:

    PNG
    media_image4.png
    558
    1046
    media_image4.png
    Greyscale

Here, Xiong discloses a knowledge graph data structure, which includes associations between first and second entity records (the arrows) from an entity dataset (the ovals)).
	the processor configured to train a [Bayesian Program Learning (BPL)] model that generates a plurality of hypotheses based on [the at least one conceptual model,] the entity dataset and the knowledge graph data structure[, the plurality of hypotheses following a sigmoid function]  (Xiong, Figure 1 Caption discloses:  “The dotted arrows (partially) show the existing relation links in the KG and the bold arrows show the reasoning paths found by the RL agent”.  Here, Xiong discloses generating a plurality of hypothesis (“bold arrows”), from the plurality of facts (“dotted arrows”), which are based on the entity dataset and the knowledge graph data structure.  This is a “hypothesis” because Xiong describes the bold arrows as “reasoning paths”, and the “reasoning” is used to form a hypothesis.  Xiong, Section 3.1 Para 2, discloses:  “The second part of the system, the RL agent, is represented as a policy network 
    PNG
    media_image2.png
    17
    125
    media_image2.png
    Greyscale
 which maps the state vector to a stochastic policy. The neural network parameters 
    PNG
    media_image3.png
    18
    18
    media_image3.png
    Greyscale
are updated using stochastic gradient descent.”  Here, Xiong discloses that this is achieved by training (“stochastic gradient descent”) a learning model (“RL agent”)). 
*BPL will be taught by Lake below
*The at least one conceptual model will be taught by Walters below
*Sigmoid function will be taught by Elfwing below.
	the processor configured to generate, using the [BPL] model, at least one hypothesis in response to input data associated with an entity (Xiong, as shown above, discloses generating a plurality of hypotheses based on a model.  In order for the knowledge graph of Xiong Figure 1 to have been created, the data would have had to be input first.  Thus, the generating of at least one hypothesis is in response to input data, which represents the whole knowledge graph.  That input data is associated with an entity (actually, all the entities).  And as shown above, the hypotheses are generated using a model.) *BPL will be taught by Lake below
	However, Xiong does not teach the processor configured to search a concept library including a set of conceptual models encoded as Bayesian Program Learning (BPL) models for at least one conceptual model from the set of conceptual models satisfying a set of criteria based on a set of boundary conditions, the BPL models including a set of dependencies among the set of conceptual models; the plurality of hypotheses based on the at least one conceptual model; Bayesian Program Learning (BPL) model; the processor configured to receive feedback on the at least one hypothesis; and the processor configured to further train the BPL model based on the feedback; the plurality of hypotheses following a sigmoid function
Walters teaches the processor configured to search a concept library including a set of conceptual models [encoded as Bayesian Program Learning (BPL) models] for at least one conceptual model from the set of conceptual models satisfying a set of criteria based on a set of boundary conditions, the BPL models including a set of dependencies among the set of conceptual models (Walters, Para [0005], discloses:  “There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application.”  Here, Walters discloses a concept library (“model library”) that has a set of models (“conceptual models” and “inference models”, which would satisfy a “set of inference criteria”). This search is based on a set of boundary conditions (“model with the desired degree of accuracy can be selected”).  All machine learning models include a set of dependencies, upon which they are trained.  Walters also describes searching the model library.  Walters, Para [0188], describes the library as indexed:  “At step 1814, model optimizer 107 stores the optimized model. In some embodiments, the optimized model is stored in a model library. For example, model optimizer 107 may store the model in model storage 109. Storing the model at step 1814 may comprise updating an index of models.”  Walters, Para [0180], describes the searching of the model library:  “At step 1804, an input model is received by model optimizer 107. The input model may be one of a machine learning model or a statistical model, consistent with disclosed embodiments. In some embodiments, the input model is a seed model received at step 1802 via interface 113. In some embodiments, receiving the input model at step 1804 includes generating or retrieving a model based on at least one of the desired outcome, a model characteristic, or a model index. In some embodiments, receiving the input model at step 1804 includes retrieving the input model from a model storage (e.g., model storage 109). The model characteristic may include one of a model type, a data schema, a data statistic, a training dataset type, a model task, a hyperparameter, a training dataset, or an outcome associated with the model. For example, step 1804 may include selecting the candidate model from among a plurality of candidate models in model storage 109 based on a determination that the desired outcome corresponds to an outcome associated with the selected candidate model.”  Recall that Xiong discloses generating a plurality of hypotheses. Thus in combination with Xiong, is also taught the limitation the plurality of hypotheses based on the at least one conceptual model)
Walters and Xiong are analogous art because they are both in the field of endeavor of machine learning.
 It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of Xiong with the model library of Walters.  One of ordinary skill in the art would be motivated to do so in order to use the model best suited for a given situation, and to achieve desired accuracy (Walters, Para [0005]:  “There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. 
However, the combination of Xiong and Walters thus far fails to teach Bayesian Program Learning (BPL) model; the processor configured to receive feedback on the at least one hypothesis; and the processor configured to further train the BPL model based on the feedback; the plurality of hypotheses following a sigmoid function
	Lake teaches Bayesian Program Learning (BPL) model.  (Lake, Pg 1333 First Full Paragraph, begins:  “This paper introduces the Bayesian program learning (BPL) framework, capable of learning a large class of visual concepts from just a single example and generalizing in ways that are mostly indistinguishable from people.”)
	Lake and the combination of Xiong and Walters are analogous art because they are both in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the knowledge graph hypothesis generation of the combination of Xiong and Walters with the BPL of Lake.  The combination would result in a knowledge graph reasoning model that is capable of “learning to learn” and making inferences based on data, as stated in Lake, Discussion:  “Despite a changing artificial intelligence landscape, people remain far better than machines at learning new concepts: They require fewer examples and use their concepts in richer ways. Our work suggests that the principles of compositionality, causality, and learning to learn will be critical in building machines that narrow this gap.”  One would be motivated to make this combination in order to save time and resources by eliminating the need to do training on hundreds or thousands of training examples (Lake, Pg 1333 first full paragraph:  “This paper introduces the Bayesian program learning (BPL) framework, capable of learning a large class of visual concepts from just a single example and generalizing in ways that are mostly indistinguishable from people. Concepts are represented as simple probabilistic programs—that is, probabilistic generative models expressed as structured procedures in an abstract description language (17, 18). Our framework brings together three key ideas—compositionality, causality, and learning to learn—that have been separately influential in cognitive science and machine learning over the past several decades (19–22). As programs, rich concepts can be built “compositionally” from simpler primitives. Their probabilistic semantics handle noise and support creative generalizations in a procedural form that (unlike 
	However, the combination of Xiong, Walters, and Lake thus far fails to teach the processor configured to receive feedback on the at least one hypothesis; and the processor configured to further train the BPL model based on the feedback; the plurality of hypotheses following a sigmoid function
	Knox teaches further comprising: receiving feedback [on the at least one hypothesis]; and the processor configured to further train the [BPL] model based on the feedback. (Knox, Introduction Para 3, discloses:  “In this paper, we develop a method by which the human trainer can merely give positive and negative reinforcement signals (called “reward” in the learning agent community) to the agent. It only requires that a person can observe the agent’s behavior, judge its quality, and send a feedback signal that can be mapped to a scalar value (e.g. by button press or verbal feedback of “good” and “bad””).  Here, Knox discloses receiving feedback in a reinforcement learning model.  Knox, Abstract, concludes:  “Leveraging the human trainers’ feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.”  Here, Knox discloses further training the model based on the feedback, as the model learns more quickly.)
Knox and the combination of Xiong, Walters, and Lake are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the reinforcement learning of the combination of Xiong, Walters, and Lake with the 
	However, the combination of Xiong, Walters, Lake, and Knox thus far fails to explicitly teach the plurality of hypotheses following a sigmoid function.
	Elfwing teaches the plurality of hypotheses following a sigmoid function. (Elfwing, Abstract, discloses:  “First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input.”  Here, Elfwing discloses following a sigmoid function, by “sigmoid function multiplied by its input”, which is much like Instant Specification [0039]:  “For example, an output layer of the generation model parameters can follow a sigmoid function, or in other words, be multiplied by the sigmoid function as the activation function of the output layer.”  
Elfwing and the combination of Xiong, Walters, Lake, and Knox are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the reinforcement learning of the combination of Xiong, Walters, Lake, and Knox with the sigmoid-weighted linear unit (SiLU) of Elfwing.  One would be motivated to do so in order to achieve significantly better performance (Elfwing, Conclusion:  “In this study, we proposed SiLU and dSiLU as activation functions for neural network function approximation in reinforcement 
	
As per Claim 9, the combination of Xiong, Walters, Lake, Knox, and Elfwing teaches the apparatus of claim 8.  Xiong teaches wherein the entity dataset includes at least one of image data, video data, audio data, textual data, or time series data.  (Xiong, Figure 1, discloses a knowledge graph where the first and second plurality of data records comprise textual data, for example “United States” and “Nationality”, respectively.)

As per Claim 10, the combination of Xiong, Walters, Lake, Knox, and Elfwing teaches the apparatus of claim 8.  Xiong teaches wherein the entity dataset includes at least one of structured data, semi-structured data, or unstructured data.  (Xiong, Figure 1, discloses a knowledge graph where the entity dataset comprises textual data, for example “United States” and “Nationality”, respectively.  These are simple text, not vector embeddings for example, and thus are unstructured data).

As per Claim 11, the combination of Xiong, Walters, Lake, Knox, and Elfwing teaches the apparatus of claim 8. Xiong teaches herein the knowledge graph data structure is received from at least one of a database, a file system, or an application.  (Xiong, Section 4.1, discloses:  “Table 1 shows the statistics of the two datasets we conduct our experiments on. Both of them are subsets of larger datasets. The triples in FB15K-237 (Toutanova et al., 2015) are sampled from FB15K (Bordes et al., 2013) with redundant relations removed.”  Here, Xiong discloses databases (“FB15K-237” and “FB15K”)).

As per Claim 13, the combination of Xiong, Walters, Lake, Knox, and Elfwing teaches the apparatus of claim 8 and the second BPL model (see Rejection to Claim 1).  Xiong teaches wherein the second BPL model is at least one of a Bayesian inference model or a reinforcement learning model.  (Xiong, Section 2 Last Paragraph, discloses:  “NSM learns to compose programs that can find answers to natural language questions, while our RL model tries to add new facts to knowledge graph (KG) by reasoning on existing KG triples.”  Here, Xiong discloses a reinforcement learning model (“RL model”)).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Xiong, Walters, Lake, Knox, and Elfwing in view of Salimans.
As per Claim 12, the combination of Xiong, Walters, Lake, Knox, and Elfwing teaches the apparatus of claim 8 as well as first BPL model and second BPL model.  However, the combination of Xiong, Aslan, Walters, and Lake does not teach improving the first BPL model and the second BPL model using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference.
Salimans teaches improving [the first BPL model and the second BPL model] using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference algorithm. (Salimans, Abstract, discloses:  “This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation.”)
Salimans and the combination of Xiong, Walters, Lake, Knox, and Elfwing are analogous art because they are both in the field of machine learning.  
It would have been obvious before the effective filing date of the claimed invention to combine the Bayesian learning of the combination of Xiong, Walters, Lake, Knox, and Elfwing with the synthesis of variational inference and Monte Carlo of Salimans.  One would be motivated to do so in order to achieve greater speed and/or accuracy (Salimans, Abstract:  “By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy.”)








Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145