DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 4/3/2020 and the Remarks and Amendments filed on 8/16/2022. Acknowledgment is made with respect to a claim to priority to Issued Patent No. 10,646,996 filed on 7/23/2018 and Provisional Application No. 62/535,703 filed on 7/21/2017.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 22, and 25 are rejected under 35 U.S.C. 103 as being obvious over Browne et al. (US 20210406774 A1, hereinafter “Browne”) in view of Denil et al. (US 20200167633 A1, hereinafter “Denil”).

Regarding claim 1, Browne discloses [a] method comprising: ([0017]; “a method associated with an AI system”)
generating a training curriculum that represents a first set of environments associated with a first concept, wherein generating the training curriculum comprises, ([0039]; “The instructor module is configured for training the AI model, the one or more trained machine-learning models, or a combination thereof on one or more concept nodes of a mental model to be learned by the AI model, the one or more trained machine-learning models, or the combination thereof using one or more curriculums for the training”; and [0140]; “The instructor module 324 can subsequently instruct the learner module 328 on training the neural network 104 (e.g., which lessons should be taught in which order) with the one or more curriculums for training the one or more concepts in the mental mode using the training data and one or more hyperparameters from the hyperlearner module 325”)
training a first neural network on the generated training curriculum, the first neural network being configured to control an agent to accomplish the first concept by performing actions to interact with the environment, ([0140]; “The instructor module 324 can subsequently instruct the learner module 328 on training the neural network 104 (e.g., which lessons should be taught in which order) with the one or more curriculums for training the one or more concepts in the mental mode using the training data and one or more hyperparameters from the hyperlearner module 325”; and [0049]; “The AI engine learns a meta-controller—or integrator—concept node in the AI model after the nodes feeding into that node are trained. The integrator node can combine these newly trained concept nodes with one or more pre-existing trained concept nodes, such as move and reach encoded in classical controllers, into a complete complex task of Grasp-n-Stacking contained in the resulting AI mode”; and Figures 1B-1E; and [0052]; “Once the Grasp and Stack concepts are trained, then all four AI concepts are then trained to learned to work with each other. The meta-controller—(e.g., integrator/selector concept)—then learns to combine the newly trained concepts with an existing Move classical controller (an external entity of code) and a Reach function (another external entity of code) into a complete Grasp-n-Stack complex task”)
wherein training the first neural network comprises, for each environment in the first set of environment, using the first reward function that rewards the first neural network when the agent controlled by using the first neural network generates a value indicating that the agent has successfully accomplished the first concept ([0060]; “The instructor module and learner module may cooperate to put in the algorithms and curriculum for the Grasp training. Initially, the AI controlled robot is expected to flail and fail. However, over time, the AI controlled robot learns what to do based on the reward the AI engine gives the AI controlled robot (for success).”; and [0062]; “Thus, referring to FIG. 1B, to further simplify the learning problem and corresponding reward functions, the modules further break the top level concept of Grasp into a lower level of two concepts of: Orienting the hand around object in preparation for Grasping, and Pinching the object, etc. Likewise, the modules further break the top level concept of Stacking into a lower level of two concepts: Orienting the hand around object in preparation for stacking, and Orienting the stack, for a simpler reward functions”; and [0118]; “This is a discrete reinforcement learning problem, that the AI engine solves with an example learning algorithm, such as the DQN algorithm, using overall task success as the reward. (Note, any discrete reinforcement learning algorithm could be used.) To make this effective, the AI engine may not choose a new concept at each time step but rather train a specific concept until it reaches a termination condition”, the termination condition being the value indicating that the agent accomplished the first concept, such as grasping on object).
Browne fails to explicitly disclose but Denil discloses for each environment in the first set of environments: generating a combination of multiple environment fragments that have been selected from a pre-defined vocabulary of environment fragments representing different properties of the environment, and ([0024]; “The agents may learn to distinguish distinct properties of an environment. This may be achieved by disentangling properties from features of objects identified in the environment. The agents may learn how instructions refer to individual properties and completely novel properties can be identified”; and [0067]; “Objects in the environment are identified by a collection of properties that are referenced by the program. The objects referenced by the program are referred to as relevant objects and their properties are set out in a relevant objects vector”; and [0107]; and [0025]; and [0012])
assigning a first reward function to the environment; and ([0048]; “There is only a requirement to specify a new program to obtain a new reward function that depends on semantic properties of objects in the environment”; and [0023]; “The example task described later relates to reaching, and the training is based on a reward dependent upon a part of the robot being near an object”; and [0013]).
Browne and Denil are analogous art because both are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the environment fragments and reward functions of Denil with the method of Browne to yield the predictable result of for each environment in the first set of environments: generating a combination of multiple environment fragments that have been selected from a pre-defined vocabulary of environment fragments representing different properties of the environment, and assigning a first reward function to the environment. The motivation for doing so would be use a neural network to receive data representing an object within an environment, and to generate property data associated with a property of the object (Denil; Abstract).

Regarding claim 22, it is a non-transitory computer storage medium claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.

Regarding claim 25, it is a system claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.



Claims 2-5, 23, 24, 26, and 27 are rejected under 35 U.S.C. § 103 as being obvious over Browne and Denil and further in view of Marcos (Marcos, “Learning Sensorimotor Abstraction”, Nov. 24, 2010, final project, Aalto University Thesis Submission, pp. 1-72, hereinafter “Marcos”).

Regarding claims 2, 23, and 26, the rejection of claims 1, 22, and 25 are incorporated and Browne discloses the agent controlled by using the neural network ([0128]; “For example, the trained AI model or the trained neural network 106 can be deployed in or used with a software application or a hardware-based system”; and [0259]; “The client computing system 802B can be, for example, one of the one or more client systems 210 of FIGS. 2A and 3A, and any one or more of the other client computing systems (e.g., 802A, 802C, 802D, 802E, 802F, 802G, 802H, and/or 804C) can include, for example, the software application or the hardware-based system in which the trained neural network 106 can be deployed”; and [0049]; “The AI engine platform first trains the concept nodes in the AI model to learn the concept nodes of, for example, pinch, orient the hand of the robot, and orient the stack using reinforcement learning. The AI engine learns a meta-controller—or integrator—concept node in the AI model after the nodes feeding into that node are trained. The integrator node can combine these newly trained concept nodes with one or more pre-existing trained concept nodes, such as move and reach encoded in classical controllers, into a complete complex task of Grasp-n-Stacking contained in the resulting AI model”).
Browne fails to explicitly disclose but Marcos discloses wherein the first concept is a bring-about concept; wherein the first reward function rewards [[the first neural network]] when the agent [[controlled by using the neural network]] generates a value indicating that the agent has successfully brought about the first concept (Page 41, ¶5; “a layered motor hierarchy could be having a high-level concept such as write something (i.e. a bring-about concept]. A lower-level concept, that is, more specifically, could be to write with a computer keyboard (i.e. a bring-about concept],”; and Page 15, ¶3; “The motor representations of the cognitive architecture presented in this work are, as the actions in the brain, goal-oriented. In order to know if a goal was successfully achieved or not it is necessary some kind of evaluation. Receiving some kind of reward signal, (or, in its absence, an estimation of it) that indicates how good or bad an action was in a particular context, will serve decision-making in future steps (as the basal ganglia do) and planing processes. Planning processes will try to favour those actions that are more likely to achieve the goal of the motor representations while avoiding those that are more likely to fail”, the reward signal being the retuned value that indicates the successful completion of the first concept; and Page 18, Figure 3.1; Schema of the interaction of the reinforcement learning elements. The agent decides to carry out an action in the present environment, which can cause a change of state in the environment. The agent will receive a certain reward that will evaluate how good or bad was the action taken; and Page 64, ¶4; “Receiving positive reward when a task is performed adequately, the agent can easily learn the sequence of actions”).
Browne, Denil, and Marcos are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the bring about concept of Marcos with the method and neural network of Browne and Denil to yield the predictable result of wherein the first concept is a bring-about concept; wherein the first reward function rewards the first neural network when the agent controlled by using the first neural network generates a value indicating that the agent has successfully brought about the first concept. The motivation for doing so would be to implement a part of the sensorimotor hierarchy found in the human cortex, so that an agent will be able to acquire a certain skill in an unknown environment without any supervision (Marcos; Page 2).

Regarding claims 3, 24, and 27, the rejection of claims 1, 22, and 25 are incorporated and Browne discloses wherein the first concept is a classification concept ([0147]; “wherein the concept is “get_high_score,” a prediction type for the concept is “classifier,” the concept follows input of the game state to the neural network, and the concept feeds output from the neural network”; and [0148], Table 2)
the first neural network when the agent controlled by using the first neural network ([0128]; “For example, the trained AI model or the trained neural network 106 can be deployed in or used with a software application or a hardware-based system”; and [0259]; “The client computing system 802B can be, for example, one of the one or more client systems 210 of FIGS. 2A and 3A, and any one or more of the other client computing systems (e.g., 802A, 802C, 802D, 802E, 802F, 802G, 802H, and/or 804C) can include, for example, the software application or the hardware-based system in which the trained neural network 106 can be deployed”; and [0049]; “The AI engine platform first trains the concept nodes in the AI model to learn the concept nodes of, for example, pinch, orient the hand of the robot, and orient the stack using reinforcement learning. The AI engine learns a meta-controller—or integrator—concept node in the AI model after the nodes feeding into that node are trained. The integrator node can combine these newly trained concept nodes with one or more pre-existing trained concept nodes, such as move and reach encoded in classical controllers, into a complete complex task of Grasp-n-Stacking contained in the resulting AI model”).
Browne fails to explicitly disclose but Marcos discloses wherein the first reward function that rewards [[the first neural network]] when the agent [[controlled by using the first neural network]] generates a value consistent with a presence or non-presence of the first concept in the first environment (Page 15, ¶3; “The motor representations of the cognitive architecture presented in this work are, as the actions in the brain, goal-oriented. In order to know if a goal was successfully achieved or not it is necessary some kind of evaluation. Receiving some kind of reward signal, (or, in its absence, an estimation of it) that indicates how good or bad an action was in a particular context, will serve decision-making in future steps (as the basal ganglia do) and planing processes. Planning processes will try to favour those actions that are more likely to achieve the goal of the motor representations while avoiding those that are more likely to fail; and Page 18, Figure 3.1; Schema of the interaction of the reinforcement learning elements. The agent decides to carry out an action in the present environment, which can cause a change of state in the environment. The agent will receive a certain reward that will evaluate how good or bad was the action taken; and Page 64, ¶4; “Receiving positive reward when a task is performed adequately, the agent can easily learn the sequence of actions”).
Browne, Denil, and Marcos are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the reward function that brings a value consistent with a concept of Marcos with the method and neural network of Browne and Denil to yield the predictable result of wherein the first reward function that rewards the first neural network when the agent controlled by using the first neural network generates a value consistent with a presence or non-presence of the first concept in the first environment. The motivation for doing so would be to implement a part of the sensorimotor hierarchy found in the human cortex, so that an agent will be able to acquire a certain skill in an unknown environment without any supervision (Marcos; Page 2).

Regarding claim 4, the rejection of claim 1 is incorporated and Browne further discloses training the first neural network ([0140]; [0052]).
Browne fails to explicitly disclose but Marcos discloses using a second reward function different from the first reward function that rewards [[the first neural network]] when the agent [[controlled by using the first neural network]] successfully accomplishes the first concept but fails to return generate a value indicating that the agent has successfully accomplished the first concept (Page 15, ¶3; “The motor representations of the cognitive architecture presented in this work are, as the actions in the brain, goal-oriented. In order to know if a goal was successfully achieved or not it is necessary some kind of evaluation. Receiving some kind of reward signal, (or, in its absence, an estimation of it) that indicates how good or bad an action was in a particular context, will serve decision-making in future steps (as the basal ganglia do) and planing processes. Planning processes will try to favour those actions that are more likely to achieve the goal of the motor representations while avoiding those that are more likely to fail,”; and Page 17, ¶4-Page 18, ¶1; “The last of the three, reinforcement learning, has been proved a useful way of learning in novel environments, a thing that supervised learning does not allow. It does not require the presence of a teacher, but rewarding signals. The rewarding signals are usually related to emotions such as the level of pleasure (good reward) or pain (bad reward) [i.e. a shaping reward] or happiness or hunger or any other emotion. In a baby, these signals are innate and, in an agent, they can be more or less easily pre-programmed. Of course, more complex rewarding signals could be obtained in later stages of the learning process, and Page 18, Figure 3.1; Schema of the interaction of the reinforcement learning elements. The agent decides to carry out an action in the present environment, which can cause a change of state in the environment. The agent will receive a certain reward that will evaluate how good or bad was the action taken; and Page 64, ¶4; “Receiving positive reward when a task is performed adequately, the agent can easily learn the sequence of actions”).
The motivation to combine Browne, Denil, and Marcos is the same as discussed above with respect to claim 3.

Regarding claim 5, the rejection of claims 1 and 4 are incorporated and Browne fails to explicitly disclose but Marcos discloses wherein the second reward function is a shaping reward function (Page 17, ¶4-Page 18, ¶1; “The last of the three, reinforcement learning, has been proved a useful way of learning in novel environments, a thing that supervised learning does not allow. It does not require the presence of a teacher, but rewarding signals. The rewarding signals are usually related to emotions such as the level of pleasure (good reward) or pain (bad reward) [i.e. a shaping reward] or happiness or hunger or any other emotion. In a baby, these signals are innate and, in an agent, they can be more or less easily pre-programmed. Of course, more complex rewarding signals could be obtained in later stages of the learning process).
The motivation to combine Browne, Denil, and Marcos is the same as discussed above with respect to claim 2.

Claims 6-8, 19, and 20 are rejected under 35 U.S.C. § 103 as being obvious over Browne and Denil in view of Stramandinoli et al. (Stramandinoli et al., “The grounding of higher order concepts in action and language: A cognitive robotics model”, Aug. 2012, Neural Networks, Volume 32, pp. 165-173, hereinafter “Stramandinoli”).

Regarding claim 6, the rejection of claim 1 is incorporated and Browne fails to explicitly disclose but Stramandinoli discloses wherein performing actions to interact with the environment comprises selecting from a set of primitive actions (Page 168, Column 1; “the neural controller of a simulated iCub robot has been trained to learn a set of words that express general actions and characterized by an evident sensorimotor component. Subsequently, combining basic words grounded in sensorimotor experience the robot learns what we call "higher-order" concepts. The training of the robot consists of three incremental steps: (i) the Basic Grounding (BG), (ii) the Higher-order Grounding 1 (HG1) and (iii) the Higher-order Grounding 2 (HG2). During the BG training stage, the simulated robot learns to perform a set of action primitives through direct sensorimotor experience (i.e. training a first sensorimotor program to accomplish the first concept using a set of primitive actions],”; and Page 169, Column 2; “To form higher-order concepts, which refer to words whose meaning is a combination of basic motor primitives, the same methodology of the previous model is adopted. After the neural network has learnt the associations between basic grounding words and motor action primitives, the various stages that lead to the acquisition of combinatorial meaning,”).
Browne, Denil, and Stramandinoli are analogous art because all are concerned with robotic training.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in robotic training to combine the primitive actions of Stramandinoli with the method of Browne and Denil to yield the predictable result of wherein performing actions to interact with the environment comprises selecting from a set of primitive actions. The motivation for doing so would be for the purpose of training of higher order concepts based on primitive concepts (Stramandinoli; Pages 8, 9).

Regarding claim 7, the rejection of claims 1 and 6 are incorporated and Marcos fails to explicitly disclose but Stramandinoli discloses wherein an action of the set of primitive actions pushes an object of the environment (Page 169, §4.2; the section discloses the PUSH function which pushes an object of a first environment).
The motivation to combine Browne, Denil, and Stramandinoli is the same as discussed above with respect to claim 6.

Regarding claim 8, the rejection of claims 1 and 6 are incorporated and Browne further discloses executing the trained first neural network on a robotic arm system comprising a robotic arm actuator, wherein an action of the set of primitive actions actuates the robotic arm actuator (Figure 1E; the figure discloses the robotic arm actuator on which the trained NN is executed; and [0051]; “FIG. 1E illustrates a diagram of an embodiment of an example AI model being utilized by a robotic arm 100E to carry out individual concepts in the complex task. Stages of the complex task may include (a) Moving to the object, (b) Reaching for the object, (c) Grasping the object, and (d) Stacking the object on a stack of objects”).

Regarding claim 19, the rejection of claim 1 is incorporated and Browne fails to explicitly disclose but Stramandinoli discloses wherein the first neural network is represented by a recurrent neural network (Abstract; “The use of recurrent neural network also permits the learning of higher-order concepts based on temporal sequences of action primitives”; and Page 167, Column 1; “The second model uses recurrent neural networks as it permits the implementation of the learning of temporal sequences of actions”).
The motivation to combine Browne, Denil, and Stramandinoli is the same as discussed above with respect to claim 6.


Regarding claim 20, the rejection of claim 9 is incorporated and Browne fails to explicitly disclose but Stramandinoli discloses wherein the first neural network is trained using natural policy optimization (Page 167, Column 2; “n order to teach the iCub to perform a set of actions primitives, the Action Primitives library of the iCub repository has been used. The library relies on the YARP Cartesian Interface, which allows the user to control the upper limbs of the robot by defining a specific pose (position and orientation in axis-angle representation) for the end-effector. In order to determine the joints configuration to move the robot arms to a desired position, a nonlinear optimization technique is used. The Action Primitives library provides a set of functions to perform basic action primitives and to combine them in order to obtain more complex behaviours”; and Page 168, Column 1; “the neural controller of a simulated iCub robot has been trained to learn a set of words that express general actions and characterized by an evident sensorimotor component”).
The motivation to combine Browne, Denil, and Stramandinoli is the same as discussed above with respect to claim 6.

Claims 13-15 are rejected under 35 USC § 103 as being obvious over Browne, Denil, Marcos, and Stramandinoli.

Regarding claim 13, the rejection of claim 6 is incorporated and Browne discloses the first neural network ([0133]).
Browne fails to explicitly disclose but Marcos discloses generating another training curriculum that represents a second set of environments associated with a second concept different from the first concept (Page 38, ¶; “It is usually easier to relate higher-level concepts of sensory and motor actions than low-level ones, inside those two main modules (sensory or motor) there are several submodules, which represent different levels of abstraction in the data: some will correspond to basic concepts (sensory or motor depending on the type of submodule) while others will correspond to complex ones [i.e. a second concept],”; and Page 41, ¶4; “a layered sensory hierarchy, could be based on the one present in the visual cortex. In the low-level, the primary visual cortex, the neurons represent orientation of lines. In the next level, area V2, they represent contours and figures. Proceeding that way, it would be possible to arrive at higher-levels of abstraction that represent different objects such as a house, a tree or a car to name but a few [i.e. generating a second plurality of environments that represent a second concept]”)
Browne, Denil, and Marcos are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the another training curriculum using a second concept of Marcos with the method and neural network of Browne and Denil to yield the predictable result of generating another training curriculum that represents a second set of environments associated with a second concept different from the first concept. The motivation for doing so would be to implement a part of the sensorimotor hierarchy found in the human cortex, so that an agent will be able to acquire a certain skill in an unknown environment without any supervision (Marcos; Page 2).
Browne fails to explicitly disclose but Stramandinoli discloses training [[, using the other training curriculum,]] a second neural network to accomplish the second concept using the first neural network (Page 168, Column 1; “the neural controller of a simulated iCub robot has been trained to learn a set of words that express general actions and characterized by an evident sensorimotor component. Subsequently, combining basic words grounded in sensorimotor experience the robot learns what we call "higher-order" concepts. The training of the robot consists of three incremental steps: (i) the Basic Grounding (BG), (ii) the Higher-order Grounding 1 (HG1) and (iii) the Higher-order Grounding 2 (HG2). During the BG training stage, the simulated robot learns to perform a set of action primitives through direct sensorimotor experience while the HG1 and HG2 training phases implement the grounding transfer process when the grounding of basic terms is transferred to higher-order words. The training algorithm is a standard back-propagation. The grounding transfer mechanism from basic concepts to higher level concepts consists of multiple steps, depending on the number of action primitives that are combined to obtain higher-order representations [i.e. training a second sensorimotor program to accomplish the second concept using the first sensorimotor program and the set of primitive actions],”; and Page 13, ¶2-Page 14, ¶1; “to form higher-order concepts, which refer to words whose meaning is a combination of basic motor primitives, the same methodology of the previous model is adopted. After the neural network has learnt the associations between basic grounding words and motor action primitives, the various stages that lead to the acquisition of combinatorial meaning,”; and §3; “Two different neural architectures will be experimented with for the learning of higher-order concepts”).
Browne, Denil, Marcos, and Stramandinoli are analogous art because all are concerned with robotic training.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in robotic training to combine the training a second NN of Stramandinoli with the method of Browne and Denil and Marcos to yield the predictable result of training, using the other training curriculum, a second neural network to accomplish the second concept using the first neural network. The motivation for doing so would be for the purpose of training of higher order concepts based on primitive concepts (Stramandinoli; Pages 8, 9).

Regarding claim 14, the rejection of claims 1, 6, and 13 are incorporated and Browne fails to explicitly disclose but Stramandinoli discloses wherein the second neural network is trained to interact with the environment using at least the set of primitive actions, wherein the second neural network calls the first neural network as an additional action (Page 169, §4.2; the section discloses training the program using primitive actions and calling the first neural network).
The motivation to combine Browne, Denil, Marcos, and Stramandinoli is the same as discussed above with respect to claim 13.

Regarding claim 15, the rejection of claims 1, 6, and 13 are incorporated and Browne fails to explicitly disclose but Stramandinoli discloses wherein the second neural network calls the first neural network as an observation (Page 167, Column 1; “a simulated iCub humanoid robot learns an embodied representation of action words through the interaction with the environment and by linking the effects of its own actions with the behaviour observed on the object before and after the action”; and Page 168, Column 1; “he neural controller of a simulated iCub robot has been trained to learn a set of words that express general actions and characterized by an evident sensorimotor component. Subsequently, combining basic words grounded in sensorimotor experience the robot learns what we call "higher-order" concepts. The training of the robot consists of three incremental steps: (i) the Basic Grounding (BG), (ii) the Higher-order Grounding 1 (HG1) and (iii) the Higher-order Grounding 2 (HG2). During the BG training stage, the simulated robot learns to perform a set of action primitives through direct sensorimotor experience, pg 8 para 3 to pg 9 para 1, The BG training stage is a simple association between input and output and as it can be observed from the network is able to learn this task in very few iterations (200)”).
Browne, Denil, Marcos and Stramandinoli are analogous art because all are concerned with robotic training.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in robotic training to combine the calling the NN as an observation of Stramandinoli with the method of Browne and Denil and Marcos to yield the predictable result of wherein the second neural network calls the first neural network as an observation. The motivation for doing so would be for the purpose of training of higher order concepts based on primitive concepts (Stramandinoli; Pages 8, 9).

Claim 21 is rejected under 35 U.S.C. § 103 as being obvious over Browne and Denil in view of Al Hasan et al. (US 20200043610 A1, hereinafter “Al Hasan”).

Regarding claim 21, the rejection of claim 1 is incorporated and Browne fails to explicitly disclose but Al Hasan discloses evaluating the environment using a concept filter to generate an evaluation result of the environment with respect to the first concept; and assigning the first reward function to the environment based on the evaluation result [0092]; “An example of such a reward function will be described in detail below. The reward may be computed, for example, every time the agent 312 selects an action and the environment process updates the environment in accordance with that action (e.g., by changing the initial concepts 302). The rewards delivered by the reward function 310 may be used by a reinforcement learning algorithm during the training phase (or during the operation phase when dynamic learning is implemented) to train the agent to choose those actions most likely to lead to a high reward for any given state. For example, some embodiments may implement deep Q-learning, as will be explained in greater detail below. Such embodiments may implement a state action value function Q(s, a) that is trained to return a score for action a to be performed in state s. The action a with the highest Q-score may thus be selected by the agent 312”).
Browne, Denil, and Al Hasan are analogous art because all are concerned with robotic training.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in robotic training to combine the concept filter and evaluation result of Al Hasan with the method of Browne and Denil to yield the predictable result of evaluating the environment using a concept filter to generate an evaluation result of the environment with respect to the first concept; and assigning the first reward function to the environment based on the evaluation result. The motivation for doing so would be to train the agent to choose those actions most likely to lead to a high reward for any given state (Al Hasan; [0092]).

Response to Arguments

Applicant’s arguments and amendments, filed on 8/16/2022, with respect to the objection to claims 9-20 have been fully considered and are persuasive.  The objection to claims 9-20 is withdrawn.

Applicant’s arguments and amendments, filed on 8/16/2022, with respect to the 35 USC § 112(b) rejection of claims 3 and 16-18 have been fully considered and are persuasive.  The 35 USC § 112(b) rejection of claims 3 and 16-18 is withdrawn.

Applicant’s arguments and amendments, filed on 8/16/2022, with respect to the 35 USC § 112(d) rejection of claim 7 have been fully considered and are persuasive.  The 35 USC § 112(d) rejection of claim 7 is withdrawn.

Applicant’s arguments and amendments, filed on 8/16/2022, with respect to the double patenting rejection of claims 1-6, 8, 9, 13-15, and 20 have been fully considered and are persuasive.  The double patenting rejection of claims 1-6, 8, 9, 13-15, and 20 is withdrawn.

Applicant’s arguments and amendments, filed on 8/16/2022, with respect to the 35 USC § 101 rejection of claims 1-6 and 9-20 have been fully considered and are persuasive. The 35 USC § 101 rejection of claims 1-6 and 9-20 is withdrawn.

Applicant’s arguments and amendments, filed on 8/16/2022, with respect to the 35 USC § 102(a)(1) rejection of claims 1, 2, 4, and 5 and the 35 USC § 103 rejection of claims 3 and 6-20 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection to reject independent claims 1, 22, and 25.  Browne and Denil are now being used to render claims 1, 22, and 25 obvious under 35 USC § 103.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127