DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1, 3-9, 11-18, and 20-23
The following claims are amended: 1, 9, and 17
The following claims are new: None
The following claims are cancelled: None
The following claims are rejected: 1, 3-9, 11-18, and 20-23
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/24/2022 has been entered.
Response to Arguments
Applicant's arguments filed 02/24/2022 to address the 35 U.S.C. 103 rejection have been fully considered but they are not persuasive. 
Applicant argues Su does not teach a pretrained neural network generating rewards between only zero and one (see Applicants remarks, pages 7-8). Examiner respectfully disagrees. Das discloses generating a reward value of either 0 or 1 in Section 2.1, Rewards 
Applicant also argues that it would not have been obvious to have combined the cited prior art reference of Das and Srivastava as they teach away (see Applicants remarks, pages 9-11). Examiner respectfully disagrees. Examiner notes that Para. [0015] of the Applicant disclosure performs “action dropout which randomly masks some edges of a node in the knowledge graph during training”. Accordingly, Srivastava performs exactly that as can be seen in Figure 1 of Srivastava where a subset of edges of nodes are temporarily masked and hidden, thus resulting in only a subset of edges being able to be traversed accordingly. Thus, it would have been obvious to a person of ordinary skill in the art to have combined the two references accordingly. Thus, Examiner respectfully asserts that the combination of the cited prior art sufficiently teaches what the Applicant is arguing. Examiner encourages Applicant to potentially more distinctly claim the masking aspect of the claimed invention such that regular dropout techniques may not be applicable.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with 
“a reasoning module” in claim 17
Because this claim limitation is being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1 and 4-8 are rejected under 35 U.S.C. 103 as being unpatentable over “GO FOR A WALK AND ARRIVE AT THE ANSWER: REASONING OVER PATHS IN KNOWLEDGE BASES USING REINFORCEMENT LEARNING” to Das, et al. (hereinafter, “Das”), in view of “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” to Srivastava, et al. (hereinafter “Srivastava”), and further in view of “Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems” to Su, et al. (hereinafter, “Su”)
As per claim 1, Das teaches a method of training a policy network to search relational paths in a knowledge graph, the method comprising
identifying, using a reasoning module, a plurality of first outgoing links from a current node in the knowledge graph; (Das, Section 2.1 discloses “From the KB, a knowledge graph G can be constructed where the entities e1, e2 are represented as the nodes and relation r as labeled edge between them”)
[[masking]], using the reasoning module, one or more links from the plurality of first outgoing links to form a plurality of second outgoing links to a subset of nodes adjacent to the current node (Das, Section 2.3 discloses “To encourage the policy to sample more diverse paths rather than sticking with a few, we add an entropy regularization term to our cost function after multiplying it by a constant (β). We treat β as a hyperparameter to control the exploration exploitation trade-off.” (Knowledge graph contains edge between nodes)); 
(Das, 3rd Page, 2nd Para. discloses “Starting from vertex corresponding to e1q in the knowledge graph G, the agent learns to traverse the environment/graph to mine the answer and stop when it determines the answer (§ 2.2”)
rewarding the reasoning module with a reward of one when a node in the subset of nodes corresponds to an observed answer is reached; (Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end…”)
and rewarding the reasoning module with the reward identified by [[a reward shaping network]] when a node not corresponding to the observed answer is reached, wherein [[the reward shaping network is a pre=trained neural network that generates the reward having a value in values]] between zero and one, wherein the value indicates a likelihood that the observed answer is reachable from the node (Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end and 0 otherwise. To elaborate, if ST = (et , e1q,rq, e2q) is the final state, then we receive a reward of +1 if et = e2q else 0.=, i.e. R(ST ) = I{et = e2q}” and 2nd Page, 2nd Para. discloses “This formulates the query-answering task as a reinforcement learning (RL) problem where the goal is to take an optimal sequence of decisions (choices of relation edges) to maximize the expected reward (reaching the correct answer node)” (Determining a reward by traversing a knowledge graph wherein a reward of 0 or 1 is determined if answer reached))
Das fails to explicitly teach:
masking
However, Srivastava (Srivastava addresses dropout) teaches:
masking (Srivastava, 2nd page, Last Para. discloses “The term “dropout” refers to dropping out units (hidden and visible)… By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections…”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the policy network as disclosed by Das to use masking as disclosed by Srivastava. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “prevent units from co-adapting too much” and improve performance of the policy network (Srivastava, Abstract)
Das fails to explicitly teach:
a reward shaping network
the reward shaping network is a pre=trained neural network that generates the reward having a value in values
However, Su (Su addresses reward shaping) teaches:
a reward shaping network (Su, Whole document discloses reward shaping with recurrent neural networks)
the reward shaping network is a pre=trained neural network that generates the reward having a value in values (Su, Abstract discloses “Here we examine three recurrent neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues generated by a simulated user and attempt to diffuse the overall evaluation of the dialogue back down to the turn level to guide the agent towards good behaviour faster” and Section 2.1 Reward shaping discloses “Reward shaping provides the system with an extra reward signal F in addition to environmental reward R, making the system learn from the composite signal R + F. The shaping reward F often encodes expert knowledge that complements the sparse signal R. Since the reward function defines the system’s objective, changing it may result in a different task. When the task is modelled as a fully observable Markov decision process (MDP), Ng et al. (1999) defined formal requirements on the shaping reward as a difference of any potential function φ on consecutive states s and s 0 which preserves the optimality of policies”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the policy network as disclosed by Das to use reward shaping as disclosed by Su. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “to increase policy learning speed” and “to guide the agent towards good behaviour faster” (Su, Abstract)

As per claim 4, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Srivastava further teaches:
wherein the masking is based on a Bernoulli distribution (Srivastava, Section 10 discloses “Dropout involves multiplying hidden activations by Bernoulli distributed random variables which take the value 1 with probability p and 0 otherwise”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das with the teachings of Srivastava for at least the same reasons as discussed above in claim 1
As per claim 5, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Das further teaches:
wherein the policy network uses a REINFORCE algorithm. (Das, Section 2.3 discloses “To solve this optimization problem, we employ REINFORCE (Williams, 1992) as follows:…”)

As per claim 6, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Das further teaches further comprising: 
receiving a query; (Das, Abstract discloses “Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query…”)
and generating, using the reasoning module and the policy network, the observed answer in response to the query. (Das, Abstract discloses “Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query to find predictive paths” and 2nd Page, 2nd Para. discloses “This paper presents a method for efficiently searching the graph for answer-providing paths using reinforcement learning (RL) conditioned on the input question…” and 3rd Page, 2nd Para. discloses “ur RL agent is given an input query of the form e1q,rq, ?  . Starting from vertex corresponding to e1q in the knowledge graph G, the agent learns to traverse the environment/graph to mine the answer and stop when it determines the answer (§ 2.2}”)

As per claim 7, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Das further teaches: 
wherein the knowledge graph is an incomplete knowledge graph. (Das, Abstract discloses “KBs are highly incomplete (Min et al., 2013), and facts not directly stored in a KB can often be inferred from those that are, creating exciting opportunities and challenges for automated reasoning”)

As per claim 8, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Das further teaches further comprising: 
after training the policy network, using the policy network to generate a set of answers in response to receiving a second query. (Das, Section 2.3 discloses Training, and Section 3 discloses “We test our model on the following query answering datasets” and section 4 discloses Experiments which generates sets of answers in response to second or more queries)

As per claim 21, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Srivastava further teaches further comprising: 
	wherein the masking randomly masks the one or more links from the plurality of first outgoing links (Srivastava, 2nd page, Last Para. discloses “The term “dropout” refers to dropping out units (hidden and visible)… By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections…”)	It would have been obvious to a person of ordinary skill in the art, before the effective 

As per claim 22, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Das further teaches further comprising: 
	training the policy network over multiple iterations (Das, Section 2.3 discloses training of the policy network)

As per claim 23, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 22, Das further teaches further comprising: 
	maximizing a reward of the policy network over the multiple iterations, wherein the reward is a combination of rewards determining over multiple iterations (Introduction, 2nd page discloses “This formulates the query-answering task as a reinforcement learning (RL) problem where the goal is to take an optimal sequence of decisions (choices of relation edges) to maximize the expected reward (reaching the correct answer node). And Section 2.3 discloses “For the policy network (πθ) described above, we want to find parameters θ that maximizes the expected reward”)

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Das, in view of Srivastava, further in view of Su, and further in view of Convolutional 2D Knowledge Graph Embeddings to Dettmers, et al (hereinafter, “Dettmers”)
As per claim 3, the combination of Das, Srivastava, and Su as shown above teaches the method of claim 1, Das further teaches:
wherein the reward is generated using a function: 

    PNG
    media_image1.png
    15
    256
    media_image1.png
    Greyscale

wherein STis a state at a target entity of a third node in the knowledge graph, es is a start entity of the current node, rq is a relation of a query received by the policy network, and eT is a target entity of the node corresponding to the observed answer, function f is a composition function over entities in the knowledge graph, and Rb(st) is a function determining a reward value (Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end and 0 otherwise. To elaborate, if ST = (et , e1q,rq, e2q) is the final state, then we receive a reward of +1 if et = e2q else 0.=, i.e. R(ST ) = I{et = e2q}” (Das is teaching the first two parts of the Reward function)
The combination of Das, Srivastava, and Ng fails to explicitly teach:

    PNG
    media_image2.png
    34
    130
    media_image2.png
    Greyscale

However, Dettmers (Dettmers addresses graph embeddings) teaches:

    PNG
    media_image2.png
    34
    130
    media_image2.png
    Greyscale
 (Dettmers, Section 4 discloses 
    PNG
    media_image3.png
    64
    460
    media_image3.png
    Greyscale
 where f is a nonlinear function)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das as modified to use the graph embedding model as disclosed by Dettmers. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “predict new links in knowledge graphs” (Dettmers, 2nd Page, 4th Para.)
s 9 and 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Das, in view of Srivastava, further in view of Su, and further in view of U.S. Pub. No US 20150286709 A1 to Sathish, et al. (hereinafter, “Sathish”)
As per claim 9, Das teaches a system of training a policy network to search relational paths in a knowledge graph, the system comprising
Identify, using the reasoning module, a plurality of first outgoing links from a current node in the knowledge graph to nodes adjacent to the current node; (Das, Section 2.1 discloses “From the KB, a knowledge graph G can be constructed where the entities e1, e2 are represented as the nodes and relation r as labeled edge between them”)
[[mask]] one or more links from the plurality of first outgoing links to form a plurality of second outgoing links using a [[mask]], the plurality of second outgoing links from the current node in a subset of the nodes adjacent to the current node (Das, Section 2.3 discloses “To encourage the policy to sample more diverse paths rather than sticking with a few, we add an entropy regularization term to our cost function after multiplying it by a constant (β). We treat β as a hyperparameter to control the exploration exploitation trade-off.” (Knowledge graph contains a plurality of outgoing edge between nodes)); 
traverse the knowledge graph using the plurality of second outgoing links; (Das, 3rd Page, 2nd Para. discloses “Starting from vertex corresponding to e1q in the knowledge graph G, the agent learns to traverse the environment/graph to mine the answer and stop when it determines the answer (§ 2.2”)
(Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end…”)
and reward the reasoning module with the reward identified by [[a reward shaping network]] when a node not corresponding to the observed answer is reached, wherein [[the reward shaping network is a pre-trained neural network that generates the rward having a value]] between zero and one (Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end and 0 otherwise. To elaborate, if ST = (et , e1q,rq, e2q) is the final state, then we receive a reward of +1 if et = e2q else 0.=, i.e. R(ST ) = I{et = e2q}”)
Das fails to explicitly teach:
mask
However, Srivastava teaches:
mask (Srivastava, 2nd page, Last Para. discloses “The term “dropout” refers to dropping out units (hidden and visible)… By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections…”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the policy network as disclosed by Das to use masking as disclosed by Srivastava. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “prevent units from co-adapting too much” and improve performance of the policy network (Srivastava, Abstract)
Das fails to explicitly teach:

the reward shaping network is a pre=trained neural network that generates the reward having a value
However, Su teaches:
a reward shaping network (Su, Whole document discloses reward shaping with recurrent neural networks)
the reward shaping network is a pre=trained neural network that generates the reward having a value (Su, Abstract discloses “Here we examine three recurrent neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues generated by a simulated user and attempt to diffuse the overall evaluation of the dialogue back down to the turn level to guide the agent towards good behaviour faster” and Section 2.1 Reward shaping discloses “Reward shaping provides the system with an extra reward signal F in addition to environmental reward R, making the system learn from the composite signal R + F. The shaping reward F often encodes expert knowledge that complements the sparse signal R. Since the reward function defines the system’s objective, changing it may result in a different task. When the task is modelled as a fully observable Markov decision process (MDP), Ng et al. (1999) defined formal requirements on the shaping reward as a difference of any potential function φ on consecutive states s and s 0 which preserves the optimality of policies”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the policy network as disclosed by Das 
Das fails to explicitly teach:
a memory
a processor coupled to the memory and configured to
However, Sathish (Sathish addresses retrieving data from knowledge graphs) teaches:
a memory (Sathish, Para. [0196] discloses “a memory”)
a processor coupled to the memory and configured to (Sathish, Para. [0198] discloses “At the time of execution, the instructions may be fetched from the corresponding memory 2205 and/or storage 2206, and executed by the processing unit 2204.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the policy network as disclosed by Das to use the memory and processor as disclosed by Sathish. The combination would have been obvious because a person of ordinary skill in the art would be motivated to perform operations and algorithms of the system.

As per claim 12, the combination of Das, Srivastava, Su, and Sathish as shown above teaches the system of claim 9, Srivastava further teaches:
wherein the mask is based on a Bernoulli distribution (Srivastava, Section 10 discloses “Dropout involves multiplying hidden activations by Bernoulli distributed random variables which take the value 1 with probability p and 0 otherwise”)


As per claim 13, the combination of Das, Srivastava, Su, and Sathish as shown above teaches the system of claim 9, Das further teaches:
wherein the policy network uses a REINFORCE algorithm. (Das, Section 2.3 discloses “To solve this optimization problem, we employ REINFORCE (Williams, 1992) as follows:…”)

As per claim 14, the combination of Das, Srivastava, Su, and Sathish as shown above teaches the system of claim 9, Das further teaches further comprising: 
receive a query; (Das, Abstract discloses “Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query…”)
and generate, using the reasoning module and the policy network, the observed answer in response to the query. (Das, Abstract discloses “Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query to find predictive paths” and 2nd Page, 2nd Para. discloses “This paper presents a method for efficiently searching the graph for answer-providing paths using reinforcement learning (RL) conditioned on the input question…” and 3rd Page, 2nd Para. discloses “ur RL agent is given an input query of the form e1q,rq, ?  . Starting from vertex corresponding to e1q in the knowledge graph G, the agent learns to traverse the environment/graph to mine the answer and stop when it determines the answer (§ 2.2}”)

As per claim 15, the combination of Das, Srivastava, Su, and Sathish as shown above teaches the system of claim 9, Das further teaches: 
wherein the knowledge graph is an incomplete knowledge graph. (Das, Abstract discloses “KBs are highly incomplete (Min et al., 2013), and facts not directly stored in a KB can often be inferred from those that are, creating exciting opportunities and challenges for automated reasoning”)

As per claim 16, the combination of Das, Srivastava, Su, and Sathish as shown above teaches the system of claim 9, Das further teaches: 
wherein the policy network is further configured to generate a set of answers in response to receiving a second query after the policy network is trained (Das, Section 2.3 discloses Training, and Section 3 discloses “We test our model on the following query answering datasets” and section 4 discloses Experiments which generates sets of answers in response to second or more queries)

	As per claim 17, Das teaches trains a policy network to search relational paths in an incomplete knowledge graph, the operations comprising (Das, Conclusion discloses “We explored a new way of automated reasoning on large knowledge bases in which we use the knowledge graphs representation of the knowledge base and train an agent to walk to the answer node conditioned on the input query.”):
	receiving a query (Das, Abstract discloses “Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query…”)
identifying, using a reasoning module, a plurality of first outgoing links from a starting node in the knowledge graph, the reasoning module trained using [[a reward shaping network to reward]] the reasoning modules using [[a reward]] having values between only zero and one (Das, Section 2.1 discloses “From the KB, a knowledge graph G can be constructed where the entities e1, e2 are represented as the nodes and relation r as labeled edge between them”, Section 2.3 discloses training and Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end and 0 otherwise. To elaborate, if ST = (et , e1q,rq, e2q) is the final state, then we receive a reward of +1 if et = e2q else 0.=, i.e. R(ST ) = I{et = e2q}”)
[[masking]], using the reasoning module, one or more links from the plurality of first outgoing links to form a plurality of second outgoing links to a subset of nodes adjacent to the start node (Das, Section 2.3 discloses “To encourage the policy to sample more diverse paths rather than sticking with a few, we add an entropy regularization term to our cost function after multiplying it by a constant (β). We treat β as a hyperparameter to control the exploration exploitation trade-off.” (Knowledge graph contains plurality of edge between nodes)); 
causes [[the reward shaping module to generate rewards with values indicating]] a likelihood the observed answer is reachable from the second outgoing links (Das, Sections 2.1-2.3 disclose traversing a knowledge graph until an answer is found or not found whereby the knowledge may be incomplete)
Das fails to explicitly teach:
masking
However, Srivastava teaches:
masking (Srivastava, 2nd page, Last Para. discloses “The term “dropout” refers to dropping out units (hidden and visible)… By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections…”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das with the teachings of Srivastava for at least the same reasons as discussed above in claim 9
	Das fails to explicitly teach:
	a reward shaping network to reward
the reward shaping module to generate rewards with values indicating
	However, Su teaches:
a reward shaping network to reward (Su, Abstract discloses “Here we examine three recurrent neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues generated by a simulated user and attempt to diffuse the overall evaluation of the dialogue back down to the turn level to guide the agent towards good behaviour faster” and Section 2.1 Reward shaping discloses “Reward shaping provides the system with an extra reward signal F in addition to environmental reward R, making the system learn from the composite signal R + F. The shaping reward F often encodes expert knowledge that complements the sparse signal R. Since the reward function defines the system’s objective, changing it may result in a different task. When the task is modelled as a fully observable Markov decision process (MDP), Ng et al. (1999) defined formal requirements on the shaping reward as a difference of any potential function φ on consecutive states s and s 0 which preserves the optimality of policies”)
the reward shaping module to generate rewards with values indicating (Su, Abstract discloses “Here we examine three recurrent neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues generated by a simulated user and attempt to diffuse the overall evaluation of the dialogue back down to the turn level to guide the agent towards good behaviour faster” and Section 2.1 Reward shaping discloses “Reward shaping provides the system with an extra reward signal F in addition to environmental reward R, making the system learn from the composite signal R + F. The shaping reward F often encodes expert knowledge that complements the sparse signal R. Since the reward function defines the system’s objective, changing it may result in a different task. When the task is modelled as a fully observable Markov decision process (MDP), Ng et al. (1999) defined formal requirements on the shaping reward as a difference of any potential function φ on consecutive states s and s 0 which preserves the optimality of policies”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das with the teachings of Srivastava for at least the same reasons as discussed above in claim 9
	Das fails to explicitly teach:
	a non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations that
 	However, Sathish teaches:
a non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations that (Sathish, Para. [0012] discloses “In accordance with another aspect of the present disclosure, a computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium is provided.”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das with the teachings of Sathish for at least the same reasons as discussed above in claim 9

	As per claim 18, the combination of Das, Srivastava, Su, and Sathish as shown above teaches the non-transitory machine-readable medium of claim 17, Das further teaches wherein the operations further comprise:
(Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end…”)

Claims 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Das, in view of Srivastava, further in view of Su, further in view of Sathish, and further in view of Dettmers
As per claim 11, the combination of Das, Srivastava, and Su as shown above teaches the system of claim 9, Das further teaches:
wherein the reward is generated using a function: 

    PNG
    media_image1.png
    15
    256
    media_image1.png
    Greyscale

wherein STis a state at a target entity of a third node in the knowledge graph, es is a start entity of the current node, rq is a relation of a query received by the policy network, and eT is a target entity of the node corresponding to the observed answer, function f is a composition function over entities in the knowledge graph, and Rb(st) is a function determining a reward value (Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end and 0 otherwise. To elaborate, if ST = (et , e1q,rq, e2q) is the final state, then we receive a reward of +1 if et = e2q else 0.=, i.e. R(ST ) = I{et = e2q}” (Das is teaching the first two parts of the Reward function)
The combination of Das, Srivastava, Su, and Sathish fails to explicitly teach:

    PNG
    media_image2.png
    34
    130
    media_image2.png
    Greyscale

However, Dettmers teaches:

    PNG
    media_image2.png
    34
    130
    media_image2.png
    Greyscale
 (Dettmers, Section 4 discloses 
    PNG
    media_image3.png
    64
    460
    media_image3.png
    Greyscale
 where f is a nonlinear function)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das as modified to use the graph embedding model as disclosed by Dettmers. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “predict new links in knowledge graphs” (Dettmers, 2nd Page, 4th Para.)

As per claim 20, the combination of Das, Srivastava, and Su as shown above teaches the non-transitory machine-readable medium of claim 18, Das further teaches:
wherein the reward is generated using a function: 

    PNG
    media_image1.png
    15
    256
    media_image1.png
    Greyscale

wherein STis a state at a target entity of a third node in the knowledge graph, es is a start entity of the current node, rq is a relation of a query received by the policy network, and eT is a target entity of the node corresponding to the observed answer, function f is a composition function over entities in the knowledge graph, and Rb(st) is a function determining a reward value (Das, Section 2.1, Rewards section discloses “We only have a terminal reward of +1 if the current location is the correct answer at the end and 0 otherwise. To elaborate, if ST = (et , e1q,rq, e2q) is the final state, then we receive a reward of +1 if et = e2q else 0.=, i.e. R(ST ) = I{et = e2q}” (Das is teaching the first two parts of the Reward function)
The combination of Das, Srivastava, Su, and Sathish fails to explicitly teach:

    PNG
    media_image2.png
    34
    130
    media_image2.png
    Greyscale

However, Dettmers teaches:

    PNG
    media_image2.png
    34
    130
    media_image2.png
    Greyscale
 (Dettmers, Section 4 discloses 
    PNG
    media_image3.png
    64
    460
    media_image3.png
    Greyscale
 where f is a nonlinear function)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Das as modified with the teachings of Dettmers for at least the same reasons as discussed above in claim 11
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Trouillon, et al. (“Complex Embedding for Simple Link Prediction”) discloses a system for link prediction in knowledge graph embeddings
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached M-TR 7:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

/H.R.M./Examiner, Art Unit 2123                                                                                                                                                                                                        


/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123