DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicants arguments filed on 10/26/2021 are fully considered as follows:
Applicant argues that the claim interpretation should not be maintained for claim 2. This argument is persuasive in view of the amendments. Therefore, the claim interpretation is not maintained for claim 2.
Applicant argues that the 35 USC 101 rejection should not be maintained in view of the amendments. This argument is persuasive in view of the amendments. Therefore, the rejection is not maintained.
Applicant argues that the 35 USC 102 rejection to claim 1 should not be maintained in view of Tremblay does not teach a processor circuitry. However, [0023] teaches “The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application” Therefore, the argument is moot.
Applicant argues that Tremblay teaches training parameters that are not a first unidimensional saliency value of a first percept or a second unidimensional saliency value of a second percept. However, a new ground of rejection is made in view of the amendments.
Applicant argues that the 35 USC 102 rejection to claim 9 should not be maintained in view of Tremblay does not include a first saliency value of a first percept or a second saliency value of a second percept, the first saliency value and the second saliency value representative of an information gain. However, a new ground of rejection is made in view of the amendments.

Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should not refer to purported merits or speculative applications of the invention and should not compare the invention with the prior art.
If the patent is of a basic nature, the entire technical disclosure may be new in the art, and the abstract should be directed to the entire disclosure. If the patent is in the nature of an improvement in an old apparatus, process, product, or composition, the abstract should include the technical disclosure of the improvement. The abstract should also mention by way of example any preferred modifications or alternatives. 
Where applicable, the abstract should include the following: (1) if a machine or apparatus, its organization and operation; (2) if an article, its method of making; (3) if a chemical compound, its identity and use; (4) if a mixture, its ingredients; (5) if a process, the steps.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “means for” in claim 9, 10, 11, 15, and 16.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 9-14, 17-22, and 41-43 are rejected under 35 U.S.C. 103 as being unpatentable over Tremblay (US 20190228495 A1) in view of Wood (US 20210034959 A1)
Regarding Claim 1, Tremblay teaches An apparatus comprising (Abstract: Various embodiments enable a robot, or other autonomous or semi-autonomous device or system, to receive data involving the performance of a task in the physical world.): memory; (Fig. 1 memory 114)
instructions; and ([0021] The instructions can be provided through an input mechanism on the robot 102)
processor circuitry to execute the instructions to: ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.)
identify a first percept and a second percept from data gathered from a demonstration of a task ([0029] a perception network 302 can be a deep neural network that accepts the demonstration data captured of the performance, such as may include image, distance, and other data as discussed herein. The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] the perception network 302 can include or utilize two neural networks.); 
map a trajectory based on the first percept and the second percept, the first percept skewed based on the first unidimensional saliency value, the second percept skewed based on the second unidimensional saliency value ([0041] Given a single image, a perception network can infer the locations of objects in the scene and their relationships. These networks can perform object detection with pose estimation, as well as relationship inference. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE} [0045] a primary purpose of various embodiments is to learn a human-readable program from a real-world demonstration. While a sensor such as a camera watches the scene, an agent (such as a person) can move the objects or perform the actions. As the demonstration is being performed, the perception network detects the objects and their relationships. Once the demonstration is complete, the state tensor from the relationship inference is thresholded to yield a set of discrete relationships between the objects. This tensor is sent to a program generation network which outputs a human-readable plan to execute.); 
determine a plurality of variations of the trajectory and create a collection of trajectories including the trajectory and the variations of the trajectory (Fig. 2A-2D [0020] This can include, for example, a person performing a task in a task environment involving one or more objects. In a manufacturing environment this might involve assembling two or more parts, while in a warehouse setting this might involve stacking objects or placing those objects on specific shelves. For a healthcare environment this might involve sanitizing a piece of medical equipment, while in a home environment this might involve emptying the dishwasher. Various other types of tasks can be performed as well within the scope of the various embodiments as would be apparent in light of the teachings and suggestions contained herein. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE}); 
imitate an action based on a first simulated signal from a first neural network of a first modality and a second simulated signal from a second neural network of a second modality, the action representative of a perceptual skill ([0029] Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan.) and create source code for a robot to execute the perceptual skill. (Claim 1 using an execution neural network and the plan, an instruction readable by the robotic device to cause a robotic device, upon execution of the instruction, to perform the task)
Tremblay does not expressly disclose but Wood discloses calculate a first unidimensional saliency value of the first percept and a second unidimensional saliency value of the second percept ([0113] Given a task definition, the architecture of the present invention first determines and stores the task-relevant/salient entities in working memory, using prior knowledge stored in the long-term memory of AN or HACM circuits analogous to ART circuits. For a given scene, the model then attempts to detect the most relevant entity by biasing its visual attention with the entity's learned low-level features. It then attends to the most salient location in the scene and attempts to recognize the object (in the TC) using AN or HACM circuits analogous to ART circuits that resonate with the features found in the salient location. The system updates its working memory with the task-relevance of the recognized entity and updates a topographic task relevance map (in the PC) with the location of the recognized entity. The stored objects and task-relevance maps are subsequently used by the PFC to construct predictions or plans [0115] The present invention includes an internal valuation module 510 to mimic basic human motivations. The internal valuation module 510 is configured to evaluate the value of the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and the context. For example, the internal valuation module values the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and context such that they are modeled mathematically to have a value in a range between zero and one, where zero is the least valuable and one is the most valuable. )
In this way, the system of Wood includes devices that can recognize, interpret, process and simulate human reactions and affects such as emotional responses to internal and external sensory stimuli, that provides real-time reinforcement learning modeling that reproduces human affects and/or reactions. Like Tremblay, Wood is concerned with learning methods.
Therefore, from these teachings of Wood and Tremblay, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Wood to the system of Tremblay since doing so would enhance the system by including architecture that integrates models rooted in neural principles, mechanisms, and computations for which there is neuro-physiological data and which link to human behaviors based on a large body of psychophysical data.
Regarding Claim 2, Tremblay teaches wherein the processor circuitry is to: determine a deviation of the action from a mean of the collection of trajectories and ([0027] A robot capturing image data representative of these actions could analyze the image data to determine orientation, location, relationship, and other information about the objects, such as will be described later herein with the approximation 260 illustrated in FIG. 2D. [0034] During execution of the program in a closed loop system, new image data is produced that can be used to perceive what is happening in the physical world, so any deviations can be detected and addressed accordingly.);
change a weight of one or more of the first simulated signal or the second simulated signal based on the deviation. ([0054] During training, data flows through the DNN in a forward propagation phase until a prediction is produced that indicates a label corresponding to the input. If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset.)
Regarding Claim 3, Tremblay teaches wherein the source code includes the weight. ([0041] Each stage in this example is a series of convolutional/ReLU layers with weights that are learned during training.)
Regarding Claim 4, Tremblay teaches wherein the first percept is a contact-less percept.  ([0029] The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] The number of belief maps produced can correspond to the number of features to be located)
Regarding Claim 5, Tremblay teaches wherein the second percept is a contact percept.  ([0029] The percepts can then be fed to a plan generation network 304, which can be another deep neural network that can process the percepts to generate, or infer, a human-readable plan corresponding to the task. This plan can be provided to a user for confirmation, and can enable another performance or editing of the plan if one or more changes are required. Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan [0030] The number of belief maps produced can correspond to…the number of objects corresponding to the task)
Regarding Claim 6, Tremblay teaches wherein the variations are random displacements from the trajectory. ([0040] An example system relies on image-centric domain randomization for training the perception network. The ability to generate human-interpretable representations can be important for modularity and stronger generalization.) 
Regarding Claim 9, Tremblay teaches A system comprising (Abstract: Various embodiments enable a robot, or other autonomous or semi-autonomous device or system, to receive data involving the performance of a task in the physical world): 
means for identifying a first percept and a second percept from data gathered from a demonstration of a task ([0029] a perception network 302 can be a deep neural network that accepts the demonstration data captured of the performance, such as may include image, distance, and other data as discussed herein. The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] the perception network 302 can include or utilize two neural networks.); 
means for mapping a trajectory based on the first percepts and the second percept, the first percept skewed based on the first saliency value, the second percept skewed based on the second saliency value ([0041] Given a single image, a perception network can infer the locations of objects in the scene and their relationships. These networks can perform object detection with pose estimation, as well as relationship inference. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE} [0045] a primary purpose of various embodiments is to learn a human-readable program from a real-world demonstration. While a sensor such as a camera watches the scene, an agent (such as a person) can move the objects or perform the actions. As the demonstration is being performed, the perception network detects the objects and their relationships. Once the demonstration is complete, the state tensor from the relationship inference is thresholded to yield a set of discrete relationships between the objects. This tensor is sent to a program generation network which outputs a human-readable plan to execute.); 
means for determining a plurality of variations of the trajectory and a collection of trajectories including the trajectory and the variations of the trajectory and (Fig. 2A-2D [0020] This can include, for example, a person performing a task in a task environment involving one or more objects. In a manufacturing environment this might involve assembling two or more parts, while in a warehouse setting this might involve stacking objects or placing those objects on specific shelves. For a healthcare environment this might involve sanitizing a piece of medical equipment, while in a home environment this might involve emptying the dishwasher. Various other types of tasks can be performed as well within the scope of the various embodiments as would be apparent in light of the teachings and suggestions contained herein. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE}); 
means for imitating an action based on a first simulated signal from a first neural network of a first modality and a second simulated signal from a second neural network of a second modality.  ([0029] Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan.) and means for executing a percept-action trajectory to perform the action. (Claim 1 using an execution neural network and the plan, an instruction readable by the robotic device to cause a robotic device, upon execution of the instruction, to perform the task)
Tremblay does not expressly disclose but Wood discloses means for calculating a first saliency value of the first percept and a second saliency value of the second percept ([0113] Given a task definition, the architecture of the present invention first determines and stores the task-relevant/salient entities in working memory, using prior knowledge stored in the long-term memory of AN or HACM circuits analogous to ART circuits. For a given scene, the model then attempts to detect the most relevant entity by biasing its visual attention with the entity's learned low-level features. It then attends to the most salient location in the scene and attempts to recognize the object (in the TC) using AN or HACM circuits analogous to ART circuits that resonate with the features found in the salient location. The system updates its working memory with the task-relevance of the recognized entity and updates a topographic task relevance map (in the PC) with the location of the recognized entity. The stored objects and task-relevance maps are subsequently used by the PFC to construct predictions or plans [0115] The present invention includes an internal valuation module 510 to mimic basic human motivations. The internal valuation module 510 is configured to evaluate the value of the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and the context. For example, the internal valuation module values the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and context such that they are modeled mathematically to have a value in a range between zero and one, where zero is the least valuable and one is the most valuable. ) the first saliency value and the second saliency value representative of an information gain ( [0124] The Where/How processing stream's spatial and motor mismatch-based maps and gains can continually forget their old parameters to instate the new parameters that are needed to control the system in its present form. )
In this way, the system of Wood includes devices that can recognize, interpret, process and simulate human reactions and affects such as emotional responses to internal and external sensory stimuli, that provides real-time reinforcement learning modeling that reproduces human affects and/or reactions. Like Tremblay, Wood is concerned with learning methods.
Therefore, from these teachings of Wood and Tremblay, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Wood to the system of Tremblay since doing so would enhance the system by including architecture that integrates models rooted in neural principles, mechanisms, and computations for which there is neuro-physiological data and which link to human behaviors based on a large body of psychophysical data.
Regarding Claim 10, Tremblay teaches further including: 
means for determining a deviation of the action from a mean of the collection of trajectories ([0027] A robot capturing image data representative of these actions could analyze the image data to determine orientation, location, relationship, and other information about the objects, such as will be described later herein with the approximation 260 illustrated in FIG. 2D. [0034] During execution of the program in a closed loop system, new image data is produced that can be used to perceive what is happening in the physical world, so any deviations can be detected and addressed accordingly.); and 
means for changing a weight of one or more of the first simulated signal or the second simulated signal based on the deviation. ([0054] During training, data flows through the DNN in a forward propagation phase until a prediction is produced that indicates a label corresponding to the input. If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset.)
Regarding Claim 11, Tremblay teaches further including means for creating source code for the robot to execute the perceptual skill (([0006] FIGS. 3A and 3B illustrate and example components that can be utilized to generate plans and enable robotic devices to perform tasks corresponding to those plans in accordance with various embodiments. [0082] Storage media and other non-transitory computer readable media for containing code, or portions of code, can include any appropriate media known or used in the art)), the source code including the weight.  ([0041] Each stage in this example is a series of convolutional/ReLU layers with weights that are learned during training.)
Regarding Claim 12, Tremblay teaches wherein the first percept is a contact-less percept.  ([0029] The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] The number of belief maps produced can correspond to the number of features to be located)
Regarding Claim 13, Tremblay teaches wherein the second percept is a contact percept. ([0029] The percepts can then be fed to a plan generation network 304, which can be another deep neural network that can process the percepts to generate, or infer, a human-readable plan corresponding to the task. This plan can be provided to a user for confirmation, and can enable another performance or editing of the plan if one or more changes are required. Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan [0030] The number of belief maps produced can correspond to…the number of objects corresponding to the task) 
Regarding Claim 14, Tremblay teaches wherein the variations are displacements from the trajectory. ([0040] An example system relies on image-centric domain randomization for training the perception network. The ability to generate human-interpretable representations can be important for modularity and stronger generalization.) 
Regarding Claim 17, Tremblay teaches A non-transitory computer readable medium comprising computer readable instructions that, when executed, cause one or more processors to at least ([0077] FIG. 7 illustrates a set of basic components of a computing device 700 that can be utilized to implement aspects of the various embodiments. In this example, the device includes at least one processor 702 for executing instructions that can be stored in a memory device or element 704): 
identify a first percept and a second percept from data gathered from a demonstration of a task ([0029] a perception network 302 can be a deep neural network that accepts the demonstration data captured of the performance, such as may include image, distance, and other data as discussed herein. The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] the perception network 302 can include or utilize two neural networks.); 
map a trajectory based on the first percept and the second percept, the first percept skewed based on the first unidimensional saliency value, the second percept skewed based on the second unidimensional saliency value ([0041] Given a single image, a perception network can infer the locations of objects in the scene and their relationships. These networks can perform object detection with pose estimation, as well as relationship inference. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE} [0045] a primary purpose of various embodiments is to learn a human-readable program from a real-world demonstration. While a sensor such as a camera watches the scene, an agent (such as a person) can move the objects or perform the actions. As the demonstration is being performed, the perception network detects the objects and their relationships. Once the demonstration is complete, the state tensor from the relationship inference is thresholded to yield a set of discrete relationships between the objects. This tensor is sent to a program generation network which outputs a human-readable plan to execute.); 
determine a collection of trajectories including the trajectory and the variations of the trajectory (Fig. 2A-2D [0020] This can include, for example, a person performing a task in a task environment involving one or more objects. In a manufacturing environment this might involve assembling two or more parts, while in a warehouse setting this might involve stacking objects or placing those objects on specific shelves. For a healthcare environment this might involve sanitizing a piece of medical equipment, while in a home environment this might involve emptying the dishwasher. Various other types of tasks can be performed as well within the scope of the various embodiments as would be apparent in light of the teachings and suggestions contained herein. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE}); 
imitate an action based on a first simulated signal from a first neural network of a first modality and a second simulated signal from a second neural network of a second modality, the action representative of a perceptual skill and  ([0029] Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan.) 
create source code for a robot to execute the perceptual skill. (Claim 1 using an execution neural network and the plan, an instruction readable by the robotic device to cause a robotic device, upon execution of the instruction, to perform the task)
Tremblay does not expressly disclose but Wood discloses calculate a first unidimensional saliency value of the first percept and a second unidimensional saliency value of the second percept ([0113] Given a task definition, the architecture of the present invention first determines and stores the task-relevant/salient entities in working memory, using prior knowledge stored in the long-term memory of AN or HACM circuits analogous to ART circuits. For a given scene, the model then attempts to detect the most relevant entity by biasing its visual attention with the entity's learned low-level features. It then attends to the most salient location in the scene and attempts to recognize the object (in the TC) using AN or HACM circuits analogous to ART circuits that resonate with the features found in the salient location. The system updates its working memory with the task-relevance of the recognized entity and updates a topographic task relevance map (in the PC) with the location of the recognized entity. The stored objects and task-relevance maps are subsequently used by the PFC to construct predictions or plans [0115] The present invention includes an internal valuation module 510 to mimic basic human motivations. The internal valuation module 510 is configured to evaluate the value of the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and the context. For example, the internal valuation module values the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and context such that they are modeled mathematically to have a value in a range between zero and one, where zero is the least valuable and one is the most valuable. )
In this way, the system of Wood includes devices that can recognize, interpret, process and simulate human reactions and affects such as emotional responses to internal and external sensory stimuli, that provides real-time reinforcement learning modeling that reproduces human affects and/or reactions. Like Tremblay, Wood is concerned with learning methods.
Therefore, from these teachings of Wood and Tremblay, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Wood to the system of Tremblay since doing so would enhance the system by including architecture that integrates models rooted in neural principles, mechanisms, and computations for which there is neuro-physiological data and which link to human behaviors based on a large body of psychophysical data.
Regarding Claim 18, Tremblay teaches wherein the instructions cause the one or more processors to: determine a deviation of the action from a mean of the collection of trajectories ([0027] A robot capturing image data representative of these actions could analyze the image data to determine orientation, location, relationship, and other information about the objects, such as will be described later herein with the approximation 260 illustrated in FIG. 2D. [0034] During execution of the program in a closed loop system, new image data is produced that can be used to perceive what is happening in the physical world, so any deviations can be detected and addressed accordingly.); and change a weight of one or more of the first simulated signal or the second simulated signal based on the deviation.  ([0054] During training, data flows through the DNN in a forward propagation phase until a prediction is produced that indicates a label corresponding to the input. If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset.)
Regarding Claim 19, Tremblay teaches wherein the source code includes the weight. ([0041] Each stage in this example is a series of convolutional/ReLU layers with weights that are learned during training.)
Regarding Claim 20, Tremblay teaches wherein the first percept is a contact-less percept.  ([0029] The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] The number of belief maps produced can correspond to the number of features to be located)
Regarding Claim 21, Tremblay teaches wherein the second percept is a contact percept.  ([0029] The percepts can then be fed to a plan generation network 304, which can be another deep neural network that can process the percepts to generate, or infer, a human-readable plan corresponding to the task. This plan can be provided to a user for confirmation, and can enable another performance or editing of the plan if one or more changes are required. Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan [0030] The number of belief maps produced can correspond to…the number of objects corresponding to the task)
Regarding Claim 22, Tremblay teaches wherein the variations are random displacements from the trajectory.  ([0040] An example system relies on image-centric domain randomization for training the perception network. The ability to generate human-interpretable representations can be important for modularity and stronger generalization.) 
Regarding Claim 41, Tremblay teaches A method comprising: (Claim 1A computer-implemented method)
identifying, by executing an instruction with a processor  ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.), a first percept and a second percept from data gathered from a demonstration of a task; ([0029] a perception network 302 can be a deep neural network that accepts the demonstration data captured of the performance, such as may include image, distance, and other data as discussed herein. The perception network can process the demonstration data to generate a set of observations or “percepts” about the task. As mentioned, this can include relationships among the objects or actions taken with respect to the objects throughout the performance. [0030] the perception network 302 can include or utilize two neural networks.)
calculating, by executing an instruction with the processor  ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.), 
mapping, by executing an instruction with the processor,  ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.) a trajectory based on the first percept and the second percept, the first percept skewed based on the first unidimensional saliency value, the second percept skewed based on the second unidimensional saliency value; ([0041] Given a single image, a perception network can infer the locations of objects in the scene and their relationships. These networks can perform object detection with pose estimation, as well as relationship inference. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE} [0045] a primary purpose of various embodiments is to learn a human-readable program from a real-world demonstration. While a sensor such as a camera watches the scene, an agent (such as a person) can move the objects or perform the actions. As the demonstration is being performed, the perception network detects the objects and their relationships. Once the demonstration is complete, the state tensor from the relationship inference is thresholded to yield a set of discrete relationships between the objects. This tensor is sent to a program generation network which outputs a human-readable plan to execute.);
determining, by executing an instruction with the processor  ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.), a collection of trajectories including the trajectory and variations of the trajectory; (Fig. 2A-2D [0020] This can include, for example, a person performing a task in a task environment involving one or more objects. In a manufacturing environment this might involve assembling two or more parts, while in a warehouse setting this might involve stacking objects or placing those objects on specific shelves. For a healthcare environment this might involve sanitizing a piece of medical equipment, while in a home environment this might involve emptying the dishwasher. Various other types of tasks can be performed as well within the scope of the various embodiments as would be apparent in light of the teachings and suggestions contained herein. [0043] After objects have been detected, their relationships can be inferred. This is accomplished via a fully connected neural network. The inputs to the network are the image coordinates of the vertices of two detected cuboids, and the output is a symbol from a set of relationships, such as the set {ABOVE, LEFT, NONE});
imitating, by executing an instruction with the processor  ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.), an action based on a first simulated signal from a first neural network of a first modality and a second simulated signal from a second neural network of a second modality, the action representative of a perceptual skill; ([0029] Once the plan is confirmed, the plan (and any relevant related data) can be provided to an execution network 306, which can be a deep neural network capable of processing data for the plan and generating, or inferring, one or more robot-readable instructions (i.e., readable by a computer processor or control system) for performing one or more actions corresponding to the plan.)
creating, by executing an instruction with the processor  ([0023] The robotic device can include at least one processor (e.g., a CPU or GPU) to execute the application and/or perform tasks on behalf of the application, and memory 110 for including non-transitory computer-readable instructions for execution by the processor.), source code for a robot to execute the perceptual skill; and transmitting the source code to the robot.  (Claim 1 using an execution neural network and the plan, an instruction readable by the robotic device to cause a robotic device, upon execution of the instruction, to perform the task)
Tremblay does not expressly disclose but Wood discloses a first unidimensional saliency value of the first percept and a second unidimensional saliency value of the second percept; ([0113] Given a task definition, the architecture of the present invention first determines and stores the task-relevant/salient entities in working memory, using prior knowledge stored in the long-term memory of AN or HACM circuits analogous to ART circuits. For a given scene, the model then attempts to detect the most relevant entity by biasing its visual attention with the entity's learned low-level features. It then attends to the most salient location in the scene and attempts to recognize the object (in the TC) using AN or HACM circuits analogous to ART circuits that resonate with the features found in the salient location. The system updates its working memory with the task-relevance of the recognized entity and updates a topographic task relevance map (in the PC) with the location of the recognized entity. The stored objects and task-relevance maps are subsequently used by the PFC to construct predictions or plans [0115] The present invention includes an internal valuation module 510 to mimic basic human motivations. The internal valuation module 510 is configured to evaluate the value of the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and the context. For example, the internal valuation module values the sensory input data as representing velocity and acceleration vectors corresponding to sensory input-specific features and context such that they are modeled mathematically to have a value in a range between zero and one, where zero is the least valuable and one is the most valuable. )
In this way, the system of Wood includes devices that can recognize, interpret, process and simulate human reactions and affects such as emotional responses to internal and external sensory stimuli, that provides real-time reinforcement learning modeling that reproduces human affects and/or reactions. Like Tremblay, Wood is concerned with learning methods.
Therefore, from these teachings of Wood and Tremblay, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Wood to the system of Tremblay since doing so would enhance the system by including architecture that integrates models rooted in neural principles, mechanisms, and computations for which there is neuro-physiological data and which link to human behaviors based on a large body of psychophysical data.
Regarding Claim 42, Tremblay teaches further including: 
determining, by executing an instruction with the processor, a deviation of the action from a mean of the collection of trajectories; and ([0027] A robot capturing image data representative of these actions could analyze the image data to determine orientation, location, relationship, and other information about the objects, such as will be described later herein with the approximation 260 illustrated in FIG. 2D. [0034] During execution of the program in a closed loop system, new image data is produced that can be used to perceive what is happening in the physical world, so any deviations can be detected and addressed accordingly.);
changing, by executing an instruction with the processor, a weight of one or more of the first simulated signal or the second simulated signal based on the deviation.  ([0054] During training, data flows through the DNN in a forward propagation phase until a prediction is produced that indicates a label corresponding to the input. If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset.)
Regarding Claim 43, Tremblay teaches wherein the variations are displacements from the trajectory.  ([0040] An example system relies on image-centric domain randomization for training the perception network. The ability to generate human-interpretable representations can be important for modularity and stronger generalization.) 
Claims 7, 8, 15, 16, 23, 24, 44, and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Tremblay (US 20190228495 A1) in view of Wood (US 20210034959 A1) in further view of Kaiser (“Extracting Whole-Body Affordances from Multimodal Exploration”) 
Regarding Claim 7, Tremblay does not expressly disclose, but Kaiser discloses wherein the processor circuitry is to calculate the first unidimensional saliency value based on an identified surface in the first percept.  (C. Determining Reachable Hypotheses The geometric shapes of the primitives together with the suitable end effector poses allow us to assign a stability value to each point x ∈ ∂pi on the surface3 of the primitive pi : This value tells how stable it is for the robot to reach the different points on the primitive’s surfaces while maintaining the preferred end effector orientation)
In this way, the system of Kaiser includes the stability of the robot to reach different points on the primitive’s surfaces. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot to reach different points on the primitive’s surfaces.
Regarding Claim 8, Tremblay does not expressly disclose, but Kaiser discloses wherein the processor circuitry is to: identify a constraint in the environment, and map the trajectory based on the constraint.  (B. Stability Maps: Each voxel of a stability map Se tells how stable the robot would be when achieving the requested end effector pose p. Stability maps, as well as reachability maps are generated offline (see [11], [12]). Pose validation methods ensure that all considered robot configurations respect constraints like self-collisions or, as in this paper, static stability)
In this way, the system of Kaiser includes the stability of the robot with respect to constraints during all robot configurations. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot with respect to constraints during all robot configurations.
Regarding Claim 15, Tremblay does not expressly disclose, but Kaiser discloses wherein the means for calculating saliency is to calculate the first saliency value based on an identified surface in the first percept.  (C. Determining Reachable Hypotheses The geometric shapes of the primitives together with the suitable end effector poses allow us to assign a stability value to each point x ∈ ∂pi on the surface3 of the primitive pi : This value tells how stable it is for the robot to reach the different points on the primitive’s surfaces while maintaining the preferred end effector orientation)
In this way, the system of Kaiser includes the stability of the robot to reach different points on the primitive’s surfaces. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot to reach different points on the primitive’s surfaces.
Regarding Claim 16, Tremblay does not expressly disclose, but Kaiser discloses further including means for identifying a constraint in the environment, and the means for mapping a trajectory is to map the trajectory based on the constraint. (B. Stability Maps: Each voxel of a stability map Se tells how stable the robot would be when achieving the requested end effector pose p. Stability maps, as well as reachability maps are generated offline (see [11], [12]). Pose validation methods ensure that all considered robot configurations respect constraints like self-collisions or, as in this paper, static stability)
In this way, the system of Kaiser includes the stability of the robot with respect to constraints during all robot configurations. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot with respect to constraints during all robot configurations.
Regarding Claim 23, Tremblay does not expressly disclose, but Kaiser discloses wherein the instructions cause the one or more processors to calculate the first unidimensional saliency value based on an identified surface in the first percept.  (C. Determining Reachable Hypotheses The geometric shapes of the primitives together with the suitable end effector poses allow us to assign a stability value to each point x ∈ ∂pi on the surface3 of the primitive pi : This value tells how stable it is for the robot to reach the different points on the primitive’s surfaces while maintaining the preferred end effector orientation)
In this way, the system of Kaiser includes the stability of the robot to reach different points on the primitive’s surfaces. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot to reach different points on the primitive’s surfaces.
Regarding Claim 24, Tremblay does not expressly disclose, but Kaiser discloses wherein the instructions cause the one or more processors to: identify a constraint in the environment; and map the trajectory based on the constraint. (B. Stability Maps: Each voxel of a stability map Se tells how stable the robot would be when achieving the requested end effector pose p. Stability maps, as well as reachability maps are generated offline (see [11], [12]). Pose validation methods ensure that all considered robot configurations respect constraints like self-collisions or, as in this paper, static stability)
In this way, the system of Kaiser includes the stability of the robot with respect to constraints during all robot configurations. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot with respect to constraints during all robot configurations.
Regarding Claim 44, Tremblay does not expressly disclose, but Kaiser discloses wherein the calculating of the first unidimensional saliency value is based on an identified surface in the first percept.  (C. Determining Reachable Hypotheses The geometric shapes of the primitives together with the suitable end effector poses allow us to assign a stability value to each point x ∈ ∂pi on the surface3 of the primitive pi : This value tells how stable it is for the robot to reach the different points on the primitive’s surfaces while maintaining the preferred end effector orientation)
In this way, the system of Kaiser includes the stability of the robot to reach different points on the primitive’s surfaces. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot to reach different points on the primitive’s surfaces.
Regarding Claim 45, Tremblay does not expressly disclose, but Kaiser discloses further including: 
identifying, by executing an instruction with the processor, a constraint in the environment; and mapping, by executing an instruction with the processor, the trajectory based on the constraint. (B. Stability Maps: Each voxel of a stability map Se tells how stable the robot would be when achieving the requested end effector pose p. Stability maps, as well as reachability maps are generated offline (see [11], [12]). Pose validation methods ensure that all considered robot configurations respect constraints like self-collisions or, as in this paper, static stability)
In this way, the system of Kaiser includes the stability of the robot with respect to constraints during all robot configurations. Like Tremblay, Kaiser is concerned with robotic actions in an environment.
Therefore, from these teachings of Tremblay and Kaiser, one of ordinary skill in the art at the time the invention was made would have found it obvious to apply the teachings of Kaiser to the system of Tremblay since doing so would enhance the system by including the stability of the robot with respect to constraints during all robot configurations.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH TRAN whose telephone number is (313)446-6642. The examiner can normally be reached 7:30am-4:30pm M-Th.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Khoi Tran can be reached on (571) 272-6919. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.A.T./Examiner, Art Unit 3664   
                                                                                                                                                                                                     /KHOI H TRAN/Supervisory Patent Examiner, Art Unit 3664