ShangDETAILED ACTION 
Response to Arguments
The amendments filed 12/14/2020 have been entered and made of record. 

Applicant's amendments and arguments filed 12/14/2020 have been fully considered but they are not persuasive.
	With regarding to newly added amendment which recites virtual control element “the plan providing at least a human-readable representation of the task”, Applicant states that cited references do not teach such plan providing at least a human-readable representation of the task. 
However the Examiner disagrees, because:
 	Li teaches a plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).

	Therefore, claims 1-20 are still not patentably distinguishable over the prior art reference(s). Further discussions are addressed in the prior art rejection section below.
	
















Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning”, IEEE, April 2017, pages 397-403), and in view of Li et al. (US 20180349527 A1, claims priority of us-provisional-application US 62515456, filed on June 5,  2017), and further in view of Goyal (US 20170316312 A1).
	Re Claim 1:	Yang discloses a computer-implemented method (see Yang: e.g., --to collect data and exhibits the following characteristics: task performing capability, task reiteration ability, generalizability, and easy applicability. …. collecting task operating data, especially for tasks that are difficult to be applied with a conventional method.
A two-phase deep learning model is also utilized in the proposed approach. A deep convolutional autoencoder extracts images features and reconstructs images, and a fully connected deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals.--, in abstract) , comprising:
	receiving image data representative of a task being physically performed (see Yang: e.g., collecting task operating data, especially for tasks that are difficult to be applied with a conventional method. A two-phase deep learning model is also utilized in the proposed approach. A deep convolutional autoencoder extracts images features and reconstructs images, and a fully connected deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals.--, in abstract, and, -- manipulation tasks and incorporate some type of smart control. The deep learning method has been applied to static image recognition [7].--, in left col., page 398; and see Fig. 3, “Input, image features”, and caption of Fig. 3 and paragraphs below, such as: -- In this study, training data for DCAE utilize sequential images acquired from the robot-mounted camera. The target of each input image is the original input data, and the mean square error (MSE) is used to modify the weight of neural networks by using Adam optimization [18].--, in page 399);
	inferring, using the image data as input to a perception neural network, object(s) changes resulting from performance of the task (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio);
	Yang however does not explicitly disclose inferring, using the image data as input to a perception neural network a relationship between at least two objects,
	Li teaches inferring, using the image data as input to a perception neural network a relationship between at least two objects (see Li: e.g., -- objects in different 
positions and/or orientations; and/or simulated images 206 that reflect 
different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots);
	Yang and Li are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang’s method using Li’s teachings by including inferring, using the image data as input to a perception neural network a relationship between at least two objects to Yang’s predicted data output from the training neural network in order to prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning (see Li: e.g., in [0028]-[0034]);
Yang as modified by Li however do not explicitly disclose inferring, using the relationship as input to a plan generation neural network, a plan corresponding to the relationship between the at least two objects;
Goyal teaches inferring, using the relationship as input to a plan generation neural network, a plan corresponding to the relationship between the at least two objects (see Goyal: e.g., -- inference, which applies the trained machine learning models to actual applications.--, in [0003], and, -- The DLP 102 is also configured to provide deep learning processing results by the DLP 102 back to the host 103--, in [0021] ;also see: -- DLP is optimized for the inference phase of deep learning processing to achieve capital and operational efficiency--, in [0018]); 
	Yang (as modified by Li) and Goyal are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang (as modified by Li)’s method using Goyal’s teachings by including inferring, using the relationship as input to a plan generation neural network, a plan corresponding to the relationship between the at least two objects to Yang (as modified by Li)’s solution of machine learning-based model to control robots that can perform tasks in an uncertain environment, such as a production line with human workers in order to achieve capital and operational efficiency (see Goyal: e.g., in [0018]-[0021]);
Yang as modified by Li and Goyal further disclose the plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).
Yang as modified by Li and Goyal further disclose receiving confirmation of the plan (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, in page 402);
	inferring, using an execution neural network and the plan, an instruction readable by a robotic device to cause the robotic device, upon execution of the instruction, to perform the task (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).

	Re Claim 2, Yang as modified by Li and Goyal further disclose inferring, using the image data as input to an object detection network, a set of belief maps representative of the at least two objects (see Yang: e.g., -- utilized convolutional layers to present a DCAE that can handle a high-resolution image to the small size of feature map. Convolutional layers with a stride can extract features and down-sample the dimension of information. Deconvolutional layers are used to reconstruct images from the encoded feature map.--, in page 399; and also see Li: e.g., -- mappings between simulated images generated from models of physical objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]).

	Re Claim 3, Yang as modified by Li and Goyal further disclose inferring, using the location probabilities as input to a relationship inference network, the relationship between the at least two objects (see Li: e.g., -- After training is complete, the generator neural network may learn the distributions of simulated training data 214 and real-world training data 216, and the discriminator neural network may learn to predict the probability that a given image is simulated or real… Since machine learning models 208 are trained using real-world training data 216 containing images collected from an environment that is identical or similar to the one in which physical process 202 operates, augmented images 220 may imitate the shading, lighting, noise, and/or other real-world conditions encountered by physical process 202 in performing the task.  Augmented images 220 and the corresponding labels (e.g., object positions, object orientations, object types, graspable points in each object, depth information and/or 3D locations of objects or features in augmented images 220, etc.) from simulation engine 120 may then be used as training data 212 for machine learning model 210..--, in [0033]-[0034]).

	Re Claim 4, Yang as modified by Li and Goyal further disclose providing the instruction to a control system of the robotic device, the robotic device storing a set of pre-scripted behaviors enabling the robotic device to perform the task according to the instruction (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).
	
	Re Claim 5, Yang as modified by Li and Goyal further disclose causing the robotic device to perform the task using the instruction (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).

	Re Claim 6, Yang discloses a computer-implemented method, comprising:
receiving data representative of a task to be performed by an automated device (see Yang: e.g., --to collect data and exhibits the following characteristics: task performing capability, task reiteration ability, generalizability, and easy applicability. …. collecting task operating data, especially for tasks that are difficult to be applied with a conventional method. A two-phase deep learning model is also utilized in the proposed approach. A deep convolutional autoencoder extracts images features and reconstructs images, and a fully connected deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals.--, in abstract; and, -- manipulation tasks and incorporate some type of smart control. The deep learning method has been applied to static image recognition [7].--, in left col., page 398; and see Fig. 3, “Input, image features”, and caption of Fig. 3 and paragraphs below, such as: -- In this study, training data for DCAE utilize sequential images acquired from the robot-mounted camera. The target of each input image is the original input data, and the mean square error (MSE) is used to modify the weight of neural networks by using Adam optimization [18].--, in page 399);
Yang however does not explicitly disclose inferring, using a first neural network and the received data, a plan corresponding to the task, 
Goyal teaches inferring, using a first neural network and the received data, a plan corresponding to the task (see Goyal: e.g., -- inference, which applies the trained machine learning models to actual applications.--, in [0003], and, -- The DLP 102 is also configured to provide deep learning processing results by the DLP 102 back to the host 103--, in [0021] ;also see: -- DLP is optimized for the inference phase of deep learning processing to achieve capital and operational efficiency--, in [0018]); 
Yang and Goyal are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang’s method using Goyal’s teachings by including inferring, using a first neural network and the received data, a plan corresponding to the task to Yang’s solution of machine learning-based model to control robots that can perform tasks in an uncertain environment, such as a production line with human workers in order to achieve capital and operational efficiency (see Goyal: e.g., in [0018]-[0021]);
Yang as modified by Goyal however do not explicitly disclose a second neural network,
Li discloses a second neural network (see Li: e.g., -- and transmitting the first augmented image to a training pipeline for an additional machine learning model that controls a behavior of the physical process.--, in claim 11, and 17; also see: -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034]), 
Li also teaches a plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).
Yang (as modified by Goyal) and Li are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang (as modified by Goyal)’s method using Li’s teachings by including inferring a plan providing at least a human-readable representation of the task {enable the end-user to interact with}, and a second neural network that controls a behavior of the physical process to Yang (as modified by Goyal) ’s predicted data output from the training neural network in order to prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning (see Li: e.g., in [0028]-[0034], and [0036])
Yang as modified by Li and Goyal further disclose causing the task to be performed by the automated device using a second neural network and the plan corresponding to the task (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).

	Re Claim 8, Yang as modified by Li and Goyal further disclose inferring, using the data as input to a perception neural network, a relationship between at least two objects resulting from performance of the task (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

	Re Claim 9, Yang as modified by Li and Goyal further disclose inferring, using the data as input to an object detection network, a set of belief maps indicative of locations of the at least two objects (see Yang: e.g., -- utilized convolutional layers to present a DCAE that can handle a high-resolution image to the small size of feature map. Convolutional layers with a stride can extract features and down-sample the dimension of information. Deconvolutional layers are used to reconstruct images from the encoded feature map.--, in page 399; and also see Li: e.g., -- mappings between simulated images generated from models of physical  objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]); and
identifying location probabilities for one or more features of the at least two objects from the belief maps (see Yang: e.g., -- utilized convolutional layers to present a DCAE that can handle a high-resolution image to the small size of feature map. Convolutional layers with a stride can extract features and down-sample the dimension of information. Deconvolutional layers are used to reconstruct images from the encoded feature map.--, in page 399; and also see Li: e.g., -- mappings between simulated images generated from models of physical objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]; and, -- After training is complete, the generator neural network may learn the distributions of simulated training data 214 and real-world training data 216, and the discriminator neural network may learn to predict the probability that a given image is simulated or real… Since machine learning models 208 are trained using real-world training data 216 containing images collected from an environment that is identical or similar to the one in which physical process 202 operates, augmented images 220 may imitate the shading, lighting, noise, and/or other real-world conditions encountered by physical process 202 in performing the task.  Augmented images 220 and the corresponding labels (e.g., object positions, object orientations, object types, graspable points in each object, depth information and/or 3D locations of objects or features in augmented images 220, etc.) from simulation engine 120 may then be used as training data 212 for machine learning model 210..--, in [0033]-[0034]).

Re Claim 10, Yang as modified by Li and Goyal further disclose inferring, using the location probabilities as input to a relationship inference network, the relationship between the at least two objects (see Li: e.g., -- mappings between simulated images generated from models of physical objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]; and, -- After training is complete, the generator neural network may learn the distributions of simulated training data 214 and real-world training data 216, and the discriminator neural network may learn to predict the probability that a given image is simulated or real… Since machine learning models 208 are trained using real-world training data 216 containing images collected from an environment that is identical or similar to the one in which physical process 202 operates, augmented images 220 may imitate the shading, lighting, noise, and/or other real-world conditions encountered by physical process 202 in performing the task.  Augmented images 220 and the corresponding labels (e.g., object positions, object orientations, object types, graspable points in each object, depth information and/or 3D locations of objects or features in augmented images 220, etc.) from simulation engine 120 may then be used as training data 212 for machine learning model 210..--, in [0033]-[0034]).

Re Claim 11, Yang as modified by Li and Goyal further disclose inferring, using the relationship as input to the plan generation neural network, the plan corresponding to the task, the human-readable representation identifying at least one action corresponding to the relationship between the at least two objects (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots; -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).

Re Claim 12, Yang as modified by Li and Goyal further disclose providing the human-readable representation for review by a human reviewer (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots); and
causing the task to be performed by the automated device in response to receiving confirmation of the human-readable representation (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

Re Claim 13, Yang as modified by Li and Goyal further disclose the human-readable representation is capable of being updated by capturing additional data for another physical demonstration of the task or through a manual updating by the human reviewer (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

	Re Claim 14, Yang as modified by Li and Goyal further disclose inferring, using the execution neural network, an instruction readable by the automated device to cause the automated device, upon execution of the instruction, to perform the task (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

	Re Claim 15, Yang as modified by Li and Goyal further disclose the data is captured using at least one of a digital camera, stereoscopic camera, infrared image sensor, structured light camera, depth sensor, ultrasonic sensor, LIDAR detector, microphone, motion capture system, or motion detector (see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

Re Claims 16-20, claims 16-20 are the corresponding system claim to claims 6-11, and 4 respectively. Thus, claims 16-20 are rejected for reasons similar to those discussed in regard to claims 6-11. Further, Yang as modified by Li and Goyal further disclose system, comprising: at least one processor; and memory including instructions that, when executed by the at least one processor, cause the system to perform the method (see Li: e.g., Fig. 1, and in [0017]-[0024]).









Conclusion
Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEI WEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on 8:00 - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WEI WEN YANG/Primary Examiner, Art Unit 2667