PNG
    media_image1.png
    172
    172
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 16/255,038
Filing Date: January 23, 2019 
Appellant(s): Jonathan Tremblay 



__________________
Joe Grdinovac
For Appellant


EXAMINER’S ANSWER




This is in response to the appeal brief filed 11/24/2021, appealing from the Final Office action mailed 3/18/2021.
(1) Grounds of Rejection to be Reviewed on Appeal
The following grounds of rejection are applicable to the appealed claims. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning”, IEEE, April 2017, pages 397-403), and in view of Li et al. (US 20180349527 A1, claims priority of us-provisional-application US 62515456, filed on June 5, 2017), and further in view of Goyal (US 20170316312 A1).
	Re Claim 1:	Yang discloses a computer-implemented method (see Yang: e.g., --to collect data and exhibits the following characteristics: task performing capability, task reiteration ability, generalizability, and easy applicability. …. collecting task operating data, especially for tasks that are difficult to be applied with a conventional method.
A two-phase deep learning model is also utilized in the proposed approach. A deep convolutional autoencoder extracts images features and reconstructs images, and a fully connected deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals.--, in abstract) , comprising:
A deep convolutional autoencoder extracts images features and reconstructs images, and a fully connected deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals.--, in abstract, and, -- manipulation tasks and incorporate some type of smart control. The deep learning method has been applied to static image recognition [7].--, in left col., page 398; and see Fig. 3, “Input, image features”, and caption of Fig. 3 and paragraphs below, such as: -- In this study, training data for DCAE utilize sequential images acquired from the robot-mounted camera. The target of each input image is the original input data, and the mean square error (MSE) is used to modify the weight of neural networks by using Adam optimization [18].--, in page 399);
	inferring, using the image data as input to a perception neural network, object(s) changes resulting from performance of the task (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio);
	Yang however does not explicitly disclose inferring, using the image data as input to a perception neural network a relationship between at least two objects,
	Li teaches inferring, using the image data as input to a perception neural network a relationship between at least two objects (see Li: e.g., -- objects in different 
positions and/or orientations; and/or simulated images 206 that reflect 
different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots.
 Above Li’s disclosures of the content of the subject matters is fully disclosed and  can find the sufficient support in Li’s  Provisional Application US 62515456, filed on June 5, 2017, as see page 6 of “Robot Learning”, pages 13-14 of “Machine Learning Machine”, and page 34 of Learning for LegoBot Training”  (as reproduced below for review)}:


    PNG
    media_image2.png
    1070
    805
    media_image2.png
    Greyscale

	Yang and Li are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang’s method using Li’s teachings by including inferring, using the image data as input to a perception neural network a relationship between at least two objects to Yang’s predicted data output from the training neural network in order to prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning (see Li: e.g., in [0028]-[0034]);
Yang as modified by Li however do not explicitly disclose inferring, using the relationship as input to a plan generation neural network, a plan corresponding to the relationship between the at least two objects;
Goyal teaches inferring, using the relationship as input to a plan generation neural network, a plan corresponding to the relationship between the at least two objects (see Goyal: e.g., -- inference, which applies the trained machine learning models to actual applications.--, in [0003], and, -- The DLP 102 is also configured to provide deep learning processing results by the DLP 102 back to the host 103--, in [0021] ;also see: -- DLP is optimized for the inference phase of deep learning processing to achieve capital and operational efficiency--, in [0018]); 
	Yang (as modified by Li) and Goyal are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang (as modified by Li)’s method using Goyal’s teachings by including inferring, using the relationship as input to a plan generation neural network, a plan corresponding to the relationship between the at least two objects to Yang (as modified by Li)’s solution of machine learning-based model to control robots that can perform tasks in an uncertain environment, such as a production line with human workers in order to achieve capital and operational efficiency (see Goyal: e.g., in [0018]-[0021]);
Yang as modified by Li and Goyal further disclose the plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “generate simulated output representing a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with, can be found support and the same disclosures in Li’ Provisional App. ‘456}).
Yang as modified by Li and Goyal further disclose receiving confirmation of the plan (see Yang: e.g., Fig. 2, Figs. 6-7, and, -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, in page 402);
	inferring, using an execution neural network and the plan, an instruction readable by a robotic device to cause the robotic device, upon execution of the instruction, to perform the task (see Yang: e.g., such as Li’s disclosure of “generate simulated output for controlling the behavior of the physical process in performing a task”,  and “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with, is properly and obviously to be combined and interchangeable with Yang’s  (Fig. 6, and Fig. 7, reproduced for review) . 


    PNG
    media_image3.png
    462
    812
    media_image3.png
    Greyscale


	And, 

    PNG
    media_image4.png
    222
    794
    media_image4.png
    Greyscale


From above Fig.2, and disclosures in the corresponding Yang’s page 3, the robotic device would execute a task, such as folding a paper, in following the plan of the task, or instruction illustrated as the image in the middle portion of Fig. 2, and as Figs 6, and 7; also operator provides motion command, and this motion command is  receiving confirmation of the plan, and, -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).

	Re Claim 2, Yang as modified by Li and Goyal further disclose inferring, using the image data as input to an object detection network, a set of belief maps representative of the at least two objects (see Yang: e.g., -- utilized convolutional layers to present a DCAE that can handle a high-resolution image to the small size of feature map. Convolutional layers with a stride can extract features and down-sample the dimension of information. Deconvolutional layers are used to reconstruct images from the encoded feature map.--, in page 399; and also see Li: e.g., -- mappings between simulated images generated from models of physical objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]).

	Re Claim 3, Yang as modified by Li and Goyal further disclose inferring, using the location probabilities as input to a relationship inference network, the relationship between the at least two objects (see Li: e.g., -- After training is complete, the generator neural network may learn the distributions of simulated training data 214 and real-world training data 216, and the discriminator neural network may learn to predict the probability that a given image is simulated or real… Since machine learning models 208 are trained using real-world training data 216 containing images collected from an environment that is identical or similar to the one in which physical process 202 operates, augmented images 220 may imitate the shading, lighting, noise, and/or other real-world conditions encountered by physical process 202 in performing the task.  Augmented images 220 and the corresponding labels (e.g., object positions, object orientations, object types, graspable points in each object, depth information and/or 3D locations of objects or features in augmented images 220, etc.) from simulation engine 120 may then be used as training data 212 for machine learning model 210...--, in [0033]-[0034]).

	Re Claim 4, Yang as modified by Li and Goyal further disclose providing the instruction to a control system of the robotic device, the robotic device storing a set of pre-scripted behaviors enabling the robotic device to perform the task according to the instruction (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).
	
	Re Claim 5, Yang as modified by Li and Goyal further disclose causing the robotic device to perform the task using the instruction (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).

	Re Claim 6, Yang discloses a computer-implemented method, comprising:
receiving data representative of a task to be performed by an automated device (see Yang: e.g., --to collect data and exhibits the following characteristics: task performing capability, task reiteration ability, generalizability, and easy applicability. …. collecting task operating data, especially for tasks that are difficult to be applied with a conventional method. A two-phase deep learning model is also utilized in the proposed approach. A deep convolutional autoencoder extracts images features and reconstructs images, and a fully connected deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals.--, in abstract; and, -- manipulation tasks and incorporate some type of smart control. The deep learning method has been applied to static image recognition [7].--, in left col., page 398; and see Fig. 3, “Input, image features”, and caption of Fig. 3 and paragraphs below, such as: -- In this study, training data for DCAE utilize sequential images acquired from the robot-mounted camera. The target of each input image is the original input data, and the mean square error (MSE) is used to modify the weight of neural networks by using Adam optimization [18].--, in page 399);
Yang however does not explicitly disclose inferring, using a first neural network and the received data, a plan corresponding to the task, 
Goyal teaches inferring, using a first neural network and the received data, a plan corresponding to the task (see Goyal: e.g., -- inference, which applies the trained machine learning models to actual applications.--, in [0003], and, -- The DLP 102 is also configured to provide deep learning processing results by the DLP 102 back to the host 103--, in [0021] ;also see: -- DLP is optimized for the inference phase of deep learning processing to achieve capital and operational efficiency--, in [0018]); 
Yang and Goyal are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang’s method using Goyal’s teachings by including inferring, using a first neural network and the received data, a plan corresponding to the task to Yang’s solution of machine learning-based model to control robots that can perform tasks in an uncertain environment, such as a production line with human workers in order to achieve capital and operational efficiency (see Goyal: e.g., in [0018]-[0021]);
Yang as modified by Goyal however do not explicitly disclose a second neural network,
Li discloses a second neural network (see Li: e.g., -- and transmitting the first augmented image to a training pipeline for an additional machine learning model that controls a behavior of the physical process.--, in claim 11, and 17; also see: -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034]), 
Li also teaches a plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).
Yang (as modified by Goyal) and Li are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang (as modified by Goyal)’s method using Li’s teachings by including inferring a plan providing at least a human-readable representation of the task {enable the end-user to interact with}, and a second neural network that controls a behavior of the physical process to Yang (as modified by Goyal) ’s predicted data output from the training neural network in order to prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning (see Li: e.g., in [0028]-[0034], and [0036])
Yang as modified by Li and Goyal further disclose causing the task to be performed by the automated device using a second neural network and the plan corresponding to the task (see Yang: e.g., -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, and, --the proposed model has shown a powerful ability for managing higher-dimension image data, and it has been proven that the model can provide a relatively stable signal for TDNN online generation. From experiment, the visual information is sufficient for task doing with the stable environment settings--, in page 402; also see: --to achieve a humanoid robot worker that can perform the folding task repeatedly with good generalizability….training data are successfully collected in a teleoperation, and the proposed approach successfully allows a non-backdrivable humanoid robot to complete the folding task--, in page 403).

	Re Claim 8, Yang as modified by Li and Goyal further disclose inferring, using the data as input to a perception neural network, a relationship between at least two objects resulting from performance of the task (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

	Re Claim 9, Yang as modified by Li and Goyal further disclose inferring, using the data as input to an object detection network, a set of belief maps indicative of locations of the at least two objects (see Yang: e.g., -- utilized convolutional layers to present a DCAE that can handle a high-resolution image to the small size of feature map. Convolutional layers with a stride can extract features and down-sample the dimension of information. Deconvolutional layers are used to reconstruct images from the encoded feature map.--, in page 399; and also see Li: e.g., -- mappings between simulated images generated from models of physical  objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]); and
identifying location probabilities for one or more features of the at least two objects from the belief maps (see Yang: e.g., -- utilized convolutional layers to present a DCAE that can handle a high-resolution image to the small size of feature map. Convolutional layers with a stride can extract features and down-sample the dimension of information. Deconvolutional layers are used to reconstruct images from the encoded feature map.--, in page 399; and also see Li: e.g., -- mappings between simulated images generated from models of physical objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]; and, -- After training is complete, the generator neural network may learn the distributions of simulated training data 214 and real-world training data 216, and the discriminator neural network may learn to predict the probability that a given image is simulated or real… Since machine learning models 208 are trained using real-world training data 216 containing images collected from an environment that is identical or similar to the one in which physical process 202 operates, augmented images 220 may imitate the shading, lighting, noise, and/or other real-world conditions encountered by physical process 202 in performing the task.  Augmented images 220 and the corresponding labels (e.g., object positions, object orientations, object types, graspable points in each object, depth information and/or 3D locations of objects or features in augmented images 220, etc.) from simulation engine 120 may then be used as training data 212 for machine learning model 210...--, in [0033]-[0034]).

Re Claim 10, Yang as modified by Li and Goyal further disclose inferring, using the location probabilities as input to a relationship inference network, the relationship between the at least two objects (see Li: e.g., -- mappings between simulated images generated from models of physical objects and real-world images of the physical objects--, in abstract, and, -- Machine learning models 208 may identify and/or include mappings 218 between simulated images 206 of objects generated by simulation engine 120 and real-world images of the same objects.  To produce mappings 218, machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).  After mappings 218 are generated (e.g., after machine learning models 208 are trained), machine learning models 208 may produce, from simulated images 206, augmented images 220 of the objects that are highly similar to and/or effectively indistinguishable from real-world images of the same objects.--, in [0031]; and, -- After training is complete, the generator neural network may learn the distributions of simulated training data 214 and real-world training data 216, and the discriminator neural network may learn to predict the probability that a given image is simulated or real… Since machine learning models 208 are trained using real-world training data 216 containing images collected from an environment that is identical or similar to the one in which physical process 202 operates, augmented images 220 may imitate the shading, lighting, noise, and/or other real-world conditions encountered by physical process 202 in performing the task.  Augmented images 220 and the corresponding labels (e.g., object positions, object orientations, object types, graspable points in each object, depth information and/or 3D locations of objects or features in augmented images 220, etc.) from simulation engine 120 may then be used as training data 212 for machine learning model 210...--, in [0033]-[0034]).

Re Claim 11, Yang as modified by Li and Goyal further disclose inferring, using the relationship as input to the plan generation neural network, the plan corresponding to the task, the human-readable representation identifying at least one action corresponding to the relationship between the at least two objects (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots; -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).

Re Claim 12, Yang as modified by Li and Goyal further disclose providing the human-readable representation for review by a human reviewer (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots); and
causing the task to be performed by the automated device in response to receiving confirmation of the human-readable representation (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

Re Claim 13, Yang as modified by Li and Goyal further disclose the human-readable representation is capable of being updated by capturing additional data for another physical demonstration of the task or through a manual updating by the human reviewer (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

	Re Claim 14, Yang as modified by Li and Goyal further disclose inferring, using the execution neural network, an instruction readable by the automated device to cause the automated device, upon execution of the instruction, to perform the task (see Yang: e.g., -- the proposed model can handle raw input data adaptively to deal with small changes in the environment and perform corresponding motions from the output command signal.--, in page 399, predict data as “output” from the neural network of training phase, such area changed ratio; also see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

	Re Claim 15, Yang as modified by Li and Goyal further disclose the data is captured using at least one of a digital camera, stereoscopic camera, infrared image sensor, structured light camera, depth sensor, ultrasonic sensor, LIDAR detector, microphone, motion capture system, or motion detector (see Li: e.g., -- objects in different positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots).

Re Claims 16-20, claims 16-20 are the corresponding system claim to claims 6-11, and 4 respectively. Thus, claims 16-20 are rejected for reasons similar to those discussed in regard to claims 6-11. Further, Yang as modified by Li and Goyal further disclose system, comprising: at least one processor; and memory including instructions that, when executed by the at least one processor, cause the system to perform the method (see Li: e.g., Fig. 1, and in [0017]-[0024]).



 (2) Response to Arguments
2.1. The Rejections of Independent Claim 1 and 12 under 35 U.S.C. § 103:
2.1.1 	Regard Claim 1, 
Appellant (start from page 6 of the Appeal Brief) argues that cited portions of reference of Li are not titled to the priority date of its Provisional Application.
However, Applicant is wrong, because the Li is cited in the Final Office Action of 3/18/2021 (as provided above) for the rejection of the limitation:
Li teaches inferring, using the image data as input to a perception neural network a relationship between at least two objects (see Li: e.g., -- objects in different 
positions and/or orientations; and/or simulated images 206 that reflect different camera positions and/or viewing angles of the objects…. prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning model 210 during use of physical process 202 in a real-world setting (e.g., performing a task in a factory or lab environment)…. machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots). --, in [0028]-[0034] {so that inferred relationship among objects and robots});
	Above Li’s disclosures of the content of the subject matters is fully disclosed and  can find the sufficient support in Li’s  Provisional Application US 62515456, filed on June 5, 2017, as see page 6 of “Robot Learning”, pages 13-14 of “Machine Learning Machine”, and page 34 of Learning for LegoBot Training”  (as reproduced below for review):


    PNG
    media_image2.png
    1070
    805
    media_image2.png
    Greyscale


It is apparent that “machine learning models 208 may be trained using simulated training data 214 that includes simulated images 206 and real-world training data 216 that includes real-world images of the same objects (e.g., images of the objects captured by cameras mounted on other robots).” has been fully disclosed and can find the sufficient support in Li’s  Provisional Application US 62515456 to establish the priority date of its Provisional Application.
Yang and Li are combinable as they are in the same field of endeavor: neural networks used in robot learning to perform tasks. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Yang’s method using Li’s teachings by including inferring, using the image data as input to a perception neural network a relationship between at least two objects to Yang’s predicted data output from the training neural network in order to prediction and/or identification of real-world objects, object positions, and/or object orientations by machine learning (see Li: e.g., in [0028]-[0034]); and
Yang as modified by Li and Goyal further disclose the plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).
Again, Above Li’s disclosures of the content of the subject matters is fully disclosed and  can find the sufficient support in Li’s  Provisional Application US 62515456, filed on June 5, 2017, as see [0013]-[0015] of in Li’s  Provisional Application US 62515456 (as reproduced below for review):

[0014] Pixel data can be provided to display processor 112 directly from CPU 102. In some embodiments of the present invention, instructions and/or data representing a scene are provided to a render farm or a set of server computers, each similar to system 100, via network adapter 118 or system disk 114. The render farm generates one or more rendered images of the scene using the provided instructions and/or data. These rendered images may be stored on computer-readable media in a digital format and optionally returned to system 100 for display. Similarly, stereo image pairs processed by display processor 112 may be output to other systems for display, stored in system disk 114, or stored on computer-readable media in a digital format.

[0015] Alternatively, CPU 102 provides display processor 112 with data and/or instructions defining the desired output images, from which display processor 112 generates the pixel data of one or more output images, including characterizing and/or adjusting the offset between stereo image pairs. The data and/or instructions defining the desired output images can be stored in system memory 104 or graphics memory within display processor 112. In an embodiment, display processor 112 includes 3D rendering capabilities for generating pixel data for output images from instructions and data defining the geometry, lighting shading, texturing, motion, and/or camera parameters for a scene. Display processor 112 can further include one or more programmable execution units capable of executing shader programs, tone mapping programs, and the like.

2.1.2, Appellant (start from page 7 of the Appeal Brief) argues that cited references do not teach “the plan providing at least a human-readable representation of the task.”
However, the Examiner disagrees, because:
 	Li teaches a plan providing at least a human-readable representation of the task (see Li: e.g., -- I/O devices 108 may be configured to receive various types of input from an end-user (e.g., a designer) of computing device 100, and to also provide various types of output to the end-user of computing device 100, such as displayed digital images or digital videos or text.--, in [0019], and, -- simulation engine 120 may provide simulation 226 in a virtual reality environment in which users and/or other entities (e.g., animals, robots, machine learning systems, etc.) can generate simulated output for controlling the behavior of the physical process in performing a task such as interacting with an object, 3D printing, machining, assembly, grasping, mining, walking, cleaning, and/or drilling.  Within simulation 226, the entities may interact with the virtual reality environment to generate simulated output representing a kinematic solution for performing the task in the virtual reality environment.--, in [0036] {herein such “generate simulated output for controlling the behavior of the physical process in performing a task”,  and “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with}).
	Furthermore, claim 1 overall considered as computer simulation method of causing a robotic device to perform a task.
	The primary reference Yang, as modified by modified by Li and Goyal clearly disclose such computer simulation method of causing a robotic device to perform a task: as discussed in obviousness and motivation statements in the Final Office Action regarding the rejection of limitations of claim 1, such as Li’s disclosure of “generate simulated output for controlling the behavior of the physical process in performing a task”,  and “a kinematic solution for performing the task in the virtual reality environment” is a plan, which is output to the end-user and enable the end-user to interact with, is properly and obviously to be combined and interchangeable with Yang’s  (Fig. 6, and Fig. 7, reproduced for review). 


    PNG
    media_image3.png
    462
    812
    media_image3.png
    Greyscale


	And:

    PNG
    media_image4.png
    222
    794
    media_image4.png
    Greyscale


From above Fig.2, and disclosures in the corresponding Yang’s page 3, the robotic device would execute a task, such as folding a paper, in following the plan of the task, or instruction illustrated as the image in the middle portion of Fig. 2, and as Figs 6, and 7; also operator provides motion command, and this motion command is  receiving confirmation of the plan, see Yang: e.g., Fig. 2, Figs. 6-7, and, -- To evaluate the reiteration ability, an experimenter stands in front of table facing to robot and disturbs the task while the robot performs the folding task (Fig. 9). It is confirmed that the robot can repeat the task even when disturbed during online generation, which proves the robustness of the proposed model.--, in page 402.

Therefore, the cited references teach every elements of independent claim 1, and similarly disclose every elements of independent claims 6, and 16. The rejections of independent claims 1, 6 and 16, and the rest dependent claims 2-5, 7-15, and 17-20 are proper, and are most respectfully requested to be sustained.

For the remaining dependent claims, Appellant does not raise any further arguments. Therefore, claims 1-20 are still not allowable over the cited art.

For the above reasons, it is most respectfully believed that the rejections should be sustained.

Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.


Respectfully submitted,


/WEI WEN YANG/Primary Examiner, Art Unit 2667

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667                                                                                                                                                                                                         
/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666