DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. Action is responsive to Application filed 01/23/2020. 
3. Claims 1-20 are examined and pending, in which claims 1-2, 4, 7, 10-12, 14, 17 and 19-20 are rejected, claims 3, 5-6, 8-9, 13, 15-16 and 18 are objected to and claims 1, 11 and 19 are independent.
Information Disclosure Statement
4. The information disclosure statements filed 04/23/2020 are in compliance with 37 CFR 1.97(c) and therein have been considered. Its corresponding PTO-1449 have been signed as attached.
Claim Rejections - 35 USC § 103
5. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37CPR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

5.1. Claims 1-2, 10-12 and 19-20 are rejected under 35 U.S.C. § 103 as being unpatentable over 
Shah et al.: "TURN-BASED REINFORCEMENT LEARNING FOR DIALOG MANAGEMENT", (United States Patent Application Publication US 20190115027 A1, filed 2017-10-12; and published 2019-04-18, hereafter “Shah”), in view of 
Simonyan et al.: "TRAINING ACTION SELECTION NEURAL NETWORKS USING LOOK-AHEAD SEARCH", (United States Patent Application Publication US 20200143239 A1, filed 2018-05-28; and published 2020-05-07, hereafter “Simonyan”), and further in view of 
Ritter et al.: "DYNAMIC, AUTOMATED FULFILLMENT OF COMPUTER-BASED RESOURCE REQUEST PROVISIONING USING DEEP REINFORCEMENT LEARNING", (United States Patent Application Publication US 20200034701 A1, filed 2018-07-26; and published 2020-01-30, hereafter “Ritter”).

As per claim 1, Shah teaches a method of updating a multi-level data structure for controlling an agent, the method comprising:
accessing a data structure defining one or more nodes (See [0035]-[0036], the knowledge graph may include nodes that represent known entities (and in some cases, entity attributes), as well as edges that connect the nodes and represent relationships between the entities and enabling identification of all references to an entity class. Here the knowledge graph teaches the data structure and enabling identification of all references to an entity class reads on accessing the entities, the nodes), and
wherein a non-leaf node of the one or more nodes is associated with one or more edges for traversing to a subsequent node (See [0035], the knowledge graph may include nodes that represent known entities, as well as edges that connect the nodes and represent relationships between the entities. For example, a "banana" node may be  connected (e.g., as a child) to a "fruit" node," which in tum may be connected (e.g., as a child) to "produce" and/or "food" nodes).
 Shah does not explicitly teach wherein an edge of the one or more edges is associated with a visit count and a softmax state-action value estimation.
However, as an analogous art on endeavoring probability neural network, Simonyan teaches wherein an edge of the one or more edges is associated with a visit count and a softmax state-action value estimation (See [0098]-[0099], for each edge that was traversed during the search, the system increments the visit count for the edge by a predetermined constant value, e.g., by one, and updates the action score for the edge using the predicted expected return for the leaf node by setting the action score equal to the new average of the predicted expected returns of all searches that involved traversing the edge; and determines the target network output using the visit counts for the outgoing edges from the root node after the edge data has been updated based on the results of the look ahead search by applying a softmax over the visit counts for the outgoing edges from the root node to determine the probabilities in the target network output.).
It would have been obvious to one having ordinary skill in the art at the time of the applicant's application was filed to combine Simonyan’s teaching with Shah reference because Simonyan is dedicated to selecting actions to be performed by a reinforcement learning agent  and Shah is dedicated to a tum-based reinforcement learning for dialog management, and the combined teaching would have enabled Shah to reduce uncertainty over a user's goals and latency expectations for a real-time dialog agent to leverage Monte Carlo tree search rollouts at inference time via reinforcement learning by rewarding actions performed.
Shah in view of Simonyan further teaches:
for each of a plurality of rounds, identifying a node trajectory including a series of nodes (Simonyan: [0011], performing the look ahead search may comprise traversing the state tree until a leaf node is reached. This may comprise selecting one of multiple edges connecting to a first node, based on an action score for the edge, to identify the next node in the tree. Here traversing on edge to next node teaches node trajectory).
However, Shah in view of Simonyan does not explicitly teach the identifying a node trajectory is performed based on an asymptotically converging sampling policy.
However, as an analogous art on machine learning and neural network, Ritter teaches the identifying a node trajectory is performed based on an asymptotically converging sampling policy is based on an asymptotically converging sampling policy (See [0232], training may continue until the output from the algorithm meets a certain threshold, meets a threshold for a given number of training cycles, or has processed for a given number of training cycles or episodes (e.g. a given number of training data scenarios may be generated). For example, meeting a threshold may include comparing the differences between output values and expected values to the threshold, or determining when output values for similar inputs converge within a threshold variance, or so on. Once the requisite number of training data sets are generated 1328 and used to train the system 1330, the parallelization is closed).
It would have been obvious to one having ordinary skill in the art at the time of the applicant's application was filed to combine Ritter’s teaching with Shah in view of Simonyan reference because Ritter is dedicated to introducing a selection of concepts in a simplified form for provisioning a job, Simonyan is dedicated to selecting actions to be performed by a reinforcement learning agent  and Shah is dedicated to a tum-based reinforcement learning for dialog management, and the combined teaching would have provided Shah in view of Simonyan with a trained machine-learning dynamic provisioning agent and asymptotically converging sampling policy for overcoming the poor quality results introduced by simple heuristics.
 Shah in view of Simonyan and further in view of Ritter further teaches:
wherein the node trajectory includes a root node and a leaf node of the data structure (See Shah: [0035], the knowledge graph may include nodes that represent known entities, as well as edges that connect the nodes and represent relationships between the entities. For example, a "banana" node may be  connected (e.g., as a child) to a "fruit" node," which in tum may be connected (e.g., as a child) to "produce" and/or "food" nodes. Here the “produce” or “food” node teaches the root node);
determining a reward indication associated with the node trajectory (See Simonyan: [0098]-[0099], for each edge that was traversed during the search, the system increments the visit count for the edge by a predetermined constant value, e.g., by one, and updates the action score for the edge using the predicted expected return for the leaf node by setting the action score equal to the new average of the predicted expected returns of all searches that involved traversing the edge; and determines the target network output using the visit counts for the outgoing edges from the root node after the edge data has been updated based on the results of the look ahead search by applying a softmax over the visit counts for the outgoing edges from the root node to determine the probabilities in the target network output. Here the count with softmax applied reads on the reward indication); and
for at least one non-leaf node in the node trajectory, updating the visit count and the softmax state-action value estimate associated with one or more edges of the non-leaf node based on the determined reward indication associated with the node trajectory (See [0097]-[0098], the system then updates the edge data for the edges traversed during the search based on the predicted return for the leaf node (step 410). In particular, for each edge that was traversed during the search, the system increments the visit count for the edge by a predetermined constant value, e.g., by one. The system also updates the action score for the edge using the predicted expected return for the leaf node by setting the action score equal to the new average of the predicted expected returns of all searches that involved traversing the edge.).

As per claim 2, Shah in view of Simonyan and further in view of Ritter teaches the  method of claim 1, comprising determining an action for controlling the agent based on the maximum softmax state-action value estimation at a given node (See Simonyan : [0099] The system determines the target action selection output for the current observation using the results of the look ahead search (step 412). In particular, the system determines the target network output using the visit counts for the outgoing edges from the root node after the edge data has been updated based on the results of the look ahead search. For example, the system can apply a softmax over the visit counts for the outgoing edges from the root node to determine the probabilities in the target network output. In some implementations, the softmax has a reduced temperature to encourage exploration of the state space. In some implementations, the softmax temperature is only reduced after a threshold number of look ahead searches have been performed within an episode to ensure that a diverse set of states are encountered during various episodes).

As per claim 10, Shah in view of Simonyan and further in view of Ritter, teaches the method of claim 1, wherein the data structure is a tree data structure (See Simonyan: [0011], performing the look ahead search may comprise traversing the state tree until a leaf node is reached. This may comprise selecting one of multiple edges connecting to a first node, based on an action score for the edge, to identify the next node in the tree. Here the state tree is the data structure and is the tree data structure).

As per claims 11 and 12, the claims recite a system for updating a multi-level data structure for controlling an agent, the system comprising: a processor (See Shah: [0013], one or more processors of one or more computing devices); and a memory coupled to the processor and storing processor executable instructions that, when executed, configure the processor (See Shah: [0013], the one or more processors are operable to execute instructions stored in associated memory, and where the instructions are configured) to cause performance of the steps as recited in claims 1 and 2, respectively and as rejected above, under 35 U.S.C. § 103 as being unpatentable over Shah in view of Simonyan and further in view of Ritter.
Therefore, claims 11-12 are rejected along the same rationale that rejected claims 1-2, respectively.

As per claim 19, the claim recites a non-transitory computer-readable medium or media having stored thereon machine interpretable instructions which, when executed by a processor, cause the processor to perform (Shah: [0013], one or more non-transitory computer readable storage media storing computer instructions executable by one or more processors) a computer implemented method of updating a multi-level data structure for controlling an agent, the method comprising the steps as recited in claim 1, and as rejected above, under 35 U.S.C. § 103 as being unpatentable over Shah in view of Simonyan and further in view of Ritter.
Therefore, claim 19 is rejected along the same rationale that rejected claim 1.

As per claim 20, Shah teaches a multi-level data structure for controlling an agent comprising a plurality of nodes including a non-leaf node associated with one or more edges for traversing to a subsequent node (See [0035]-[0036], the knowledge graph may include nodes that represent known entities (and in some cases, entity attributes), as well as edges that connect the nodes and represent relationships between the entities and enabling identification of all references to an entity class, and the knowledge graph may include nodes that represent known entities, as well as edges that connect the nodes and represent relationships between the entities. For example, a "banana" node may be connected (e.g., as a child) to a "fruit" node," which in tum may be connected (e.g., as a child) to "produce" and/or "food" nodes)).
Shah does not explicitly teach wherein an edge of the one or more edges is associated with a visit count and a softmax state-action value estimation.
However, as an analogous art on endeavoring probability neural network, Simonyan teaches wherein an edge of the one or more edges is associated with a visit count and a softmax state- action value estimation (See [0098]-[0099], for each edge that was traversed during the search, the system increments the visit count for the edge by a predetermined constant value, e.g., by one, and updates the action score for the edge using the predicted expected return for the leaf node by setting the action score equal to the new average of the predicted expected returns of all searches that involved traversing the edge; and determines the target network output using the visit counts for the outgoing edges from the root node after the edge data has been updated based on the results of the look ahead search by applying a softmax over the visit counts for the outgoing edges from the root node to determine the probabilities in the target network output.).
It would have been obvious to one having ordinary skill in the art at the time of the applicant's application was filed to combine Simonyan’s teaching with Shah reference because Simonyan is dedicated to selecting actions to be performed by a reinforcement learning agent and Shah is dedicated to a tum-based reinforcement learning for dialog management, and the combined teaching would have enabled Shah to reduce uncertainty over a user's goals and latency expectations for a real-time dialog agent to leverage Monte Carlo tree search rollouts at inference time via reinforcement learning by rewarding actions performed.).
Shah in view of Simonyan further teaches:
wherein the multi-level data structure (See Shah: [0035], the knowledge graph may include nodes that represent known entities, as well as edges that connect the nodes and represent relationships between the entities. For example, a "banana" node may be connected (e.g., as a child) to a "fruit" node," which in tum may be connected (e.g., as a child) to "produce" and/or "food" nodes) was updated by a method comprising:
accessing the data structure (See Shah: [0035]-[0036], the knowledge graph may include nodes that represent known entities (and in some cases, entity attributes), as well as edges that connect the nodes and represent relationships between the entities and enabling identification of all references to an entity class. Here the knowledge graph teaches the data structure and enabling identification of all references to an entity class reads on accessing the entities, the nodes).
Shah in view of Simonyan further teaches:
for each of a plurality of rounds, identifying a node trajectory including a series of nodes (Simonyan: [0011], performing the look ahead search may comprise traversing the state tree until a leaf node is reached. This may comprise selecting one of multiple edges connecting to a first node, based on an action score for the edge, to identify the next node in the tree. Here traversing on edge to next node teaches node trajectory).
However, Shah in view of Simonyan does not explicitly teach the identifying a node trajectory is performed based on an asymptotically converging sampling policy.
However, as an analogous art on machine learning and neural network, Ritter teaches the identifying a node trajectory is performed based on an asymptotically converging sampling policy is based on an asymptotically converging sampling policy (See [0232], training may continue until the output from the algorithm meets a certain threshold, meets a threshold for a given number of training cycles, or has processed for a given number of training cycles or episodes (e.g. a given number of training data scenarios may be generated). For example, meeting a threshold may include comparing the differences between output values and expected values to the threshold, or determining when output values for similar inputs converge within a threshold variance, or so on. Once the requisite number of training data sets are generated 1328 and used to train the system 1330, the parallelization is closed). 
It would have been obvious to one having ordinary skill in the art at the time of the applicant's application was filed to combine Ritter’s teaching with Shah in view of Simonyan reference because Ritter is dedicated to introducing a selection of concepts in a simplified form for provisioning a job, Simonyan is dedicated to selecting actions to be performed by a reinforcement learning agent and Shah is dedicated to a tum-based reinforcement learning for dialog management, and the combined teaching would have provided Shah in view of Simonyan with a trained machine-learning dynamic provisioning agent and asymptotically converging sampling policy for overcoming the poor quality results introduced by simple heuristics. 
Shah in view of Simonyan and further in view of Ritter further teaches:
wherein the node trajectory includes a root node and a leaf node of the data structure (See Shah: [0035], the knowledge graph may include nodes that represent known entities, as well as edges that connect the nodes and represent relationships between the entities. For example, a "banana" node may be connected (e.g., as a child) to a "fruit" node," which in tum may be connected (e.g., as a child) to "produce" and/or "food" nodes. Here the “produce” or “food” node teaches the root node); 
(See Simonyan: [0098]-[0099], for each edge that was traversed during the search, the system increments the visit count for the edge by a predetermined constant value, e.g., by one, and updates the action score for the edge using the predicted expected return for the leaf node by setting the action score equal to the new average of the predicted expected returns of all searches that involved traversing the edge; and determines the target network output using the visit counts for the outgoing edges from the root node after the edge data has been updated based on the results of the look ahead search by applying a softmax over the visit counts for the outgoing edges from the root node to determine the probabilities in the target network output. Here the count with softmax applied reads on the reward indication); and 
for at least one non-leaf node in the node trajectory, updating the visit count and the softmax state-action value estimate associated with one or more edges of the non-leaf node based on the determined reward indication associated with the node trajectory (See [0097]-[0098], the system then updates the edge data for the edges traversed during the search based on the predicted return for the leaf node (step 410). In particular, for each edge that was traversed during the search, the system increments the visit count for the edge by a predetermined constant value, e.g., by one. The system also updates the action score for the edge using the predicted expected return for the leaf node by setting the action score equal to the new average of the predicted expected returns of all searches that involved traversing the edge.)

5.2. Claims 4 and 14 are rejected under 35 U.S.C. § 103 as being unpatentable over 
Shah in view of Simonyan and further in view of Ritter, as applied to claims 1-2, 10-12 and 19-20  above, and further in view of 
Hoff: "COMPUTER-IMPLEMENTED SYSTEM AND METHOD FOR EFFICIENTLY PERFORMING AREA-TO-POINT CONVERSION OF SATELLITE IMAGERY FOR PHOTOVOLTAIC POWER GENERATION FLEET OUTPUT ESTIMATION", (United States Patent Application Publication US 20110282602 A1, filed 2011-07-25; and published 2011-11-17).
As per claim 4, Shah in view of Simonyan and further in view of Ritter does not explicitly teach the  method of claim 1, wherein the asymptotically converging sampling policy is associated with a mean squared error lower bound.
However, Hoff teaches the  method of claim 1, wherein the asymptotically converging sampling policy is associated with a mean squared error lower bound (See [0158], [0186]-[0187] and [0196], the clearness index correlation coefficient converges to the clearness correlation coefficient as the time interval increases; a probability density function integrating between 0 and {square root over (Area)} equals 1 (i.e., P[0.ltoreq.D.ltoreq. {square root over (Area)}]=.intg..sub.0.sup. {square root over (Area)}f.sub.Quad.sup.dD=1); and a normal distribution function is with a mean of {square root over (Area)} and standard deviation of 0.1 {square root over (Area)} and the correlation coefficient is calculated by evaluating the distance function equal to {square root over (Area)} for the upper bound and 0 for the lower bound is evaluated).
It would have been obvious to one having ordinary skill in the art at the time of the applicant's application was filed to combine Hoff’s teaching with Shah in view of Simonyan and further in view of Ritter reference because Hoff is dedicated to efficiently performing area-to-point conversion of satellite imagery, Ritter is dedicated to introducing a selection of concepts in a simplified form for provisioning a job, Simonyan is dedicated to selecting actions to be performed by a reinforcement learning agent  and Shah is dedicated to a tum-based reinforcement learning for dialog management, and the combined teaching would have provided Shah in view of Simonyan and further in view of Ritter to apply convergent function with a mean squared error lower bound for providing more accurate result to the converging sampling policy.

As per claim 14, the claim recites a system for updating a multi-level data structure for controlling an agent, the system comprising: a processor (See Shah: [0013], one or more processors of one or more computing devices); and a memory coupled to the processor and storing processor executable instructions that, when executed, configure the processor (See Shah: [0013], the one or more processors are operable to execute instructions stored in associated memory, and where the instructions are configured) to cause performance of the steps as recited in claim 4 and as rejected above, under 35 U.S.C. § 103 as being unpatentable over Shah in view of Simonyan and further in view of Ritter and Hoff.
Therefore, claim 14 is rejected along the same rationale that rejected claim 4.

5.3. Claims 7, and 17 are rejected under 35 U.S.C. § 103 as being unpatentable over 
Shah in view of Simonyan and further in view of Ritter, as applied to claims 1-2, 10-12 and 19-20  above, and further in view of 
LIU et al.: "METHOD AND SYSTEM FOR ONLINE DECISION MAKING OF GENERATOR START-UP", (United States Patent Application Publication US 20200127457 A1, filed 2018-09-29; and published 2020-04-23, hereafter “LIU”).
As per claim 7, Shah in view of Simonyan and further in view of Ritter does not explicitly teach the method of claim 1, wherein determining a reward indication associated with the node trajectory is based on a Monte Carlo evaluation simulating the node trajectory from the root node to the leaf node of the data structure.
However, as analogous application on Monte Carlo and value network, LIU teaches the method of claim 1, wherein determining a reward indication associated with the node trajectory is based on a Monte Carlo evaluation simulating the node trajectory from the root node to the leaf node of the data structure (See [0063]and [0073], searching and evaluating alternative lines to be restored in next step with Monte Carlo tree search and value network; and value network is a trained deep neural network, which is used in the simulation part of Monte Carlo tree search).
It would have been obvious to one having ordinary skill in the art at the time of the applicant's application was filed to combine LIU’s teaching with Shah in view of Simonyan and further in view of Ritter reference because LIU is dedicated to using Monte Carlo and value network for making online decision to startup a generator, Ritter is dedicated to introducing a selection of concepts in a simplified form for provisioning a job, Simonyan is dedicated to selecting actions to be performed by a reinforcement learning agent  and Shah is dedicated to a tum-based reinforcement learning for dialog management, and the combined teaching would have provided Shah in view of Simonyan and further in view of Ritter to apply value network and Monte Carlo evaluation and simulation to further improve the result of re-enhance learning by rewarding.

As per claim 17, the claim recites a system for updating a multi-level data structure for controlling an agent, the system comprising: a processor (See Shah: [0013], one or more processors of one or more computing devices); and a memory coupled to the processor and storing processor executable instructions that, when executed, configure the processor (See Shah: [0013], the one or more processors are operable to execute instructions stored in associated memory, and where the instructions are configured) to cause performance of the steps as recited in claim 7 and as rejected above, under 35 U.S.C. § 103 as being unpatentable over Shah in view of Simonyan and further in view of Ritter and LIU.
Therefore, claim 17 is rejected along the same rationale that rejected claim 7.
Allowable Subject Matter
6. Claims 3, 5-6, 8-9, 13, 15-16 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
References
7.1. The prior art made of record
A. US Patent Application Publication US-20190115027-A1.
B. US Patent Application Publication US-20200143239-A1.
C. US Patent Application Publication US-20200034701-A1.
D. US Patent Application Publication US-20110282602-A1.
7.2. The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure. 
E. US Patent Application Publication US-20200127457-A1.
F. US Patent Application Publication US-20210081804-A1.
Conclusion
8.1. Examiner has cited particular columns and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. SEE MPEP 2141.02 [R-5] VI. PRIOR ART MUST BE CONSIDERED IN ITS ENTIRETY, INCLUDING DISCLOSURES THAT TEACH AWAY FROM THE CLAIMS: A prior art reference must be considered in its entirety, i.e., as a whole, including portions that would lead away from the claimed invention. W.L. Gore & Associates, Inc. v. Garlock, Inc., 721 F.2d 1540, 220 USPQ 303 (Fed. Cir. 1983), cert. denied, 469 U.S. 851 (1984) In re Fulton, 391 F.3d 1195, 1201, 73 USPQ2d 1141, 1146 (Fed. Cir. 2004). >See also MPEP §2123. 
8.2. In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. 
Contact Information
9. Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KUEN S LU whose telephone number is (571)272-4114. The examiner can normally be reached on M-F, 8-19, Mid-Flex 2 hours.
If attempts to reach the examiner by telephone pre unsuccessful, the examiner's
Supervisor, Mrs. Tamara T Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for Page 13 Published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http: “//pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, please call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
KUEN S LU  /Kuen S Lu/
Art Unit 2156
Primary Patent Examiner
May 24, 2022