DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This nonfinal action is in reply to the response filed on 7/5/2022.
Claims 1, 19, and 20 are amended. Claim 21 is added.
The Amendment filed 7/5/2022 has been entered.
Claim 1 and 3-21 are currently pending and have been examined.
This action is made NON-FINAL. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/5/2022 complies with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is considered by the examiner.
Response to Arguments
Applicant's arguments, see pages 10-11, with respect to Zhang not teaching “determining a transformation between the reference frame of the image and a reference frame of the robot” have been fully considered but they are not persuasive. According to the specification of the instant application, “the transformation between the reference frame of the image and the reference frame of the robot can also be determined/modified in real time by using observations of the robot with the camera to adjust an estimate of the transformation between the reference frame of the image and the reference frame of the robot” [0054]. Based upon this description, its understood that the transformation between the reference frame of the image and the reference frame of the robot can be determined by a camera observation of the robot. Zhang teaches a datum, or a common reference frame, and the datum can be the robot itself ([0027] of Zhang). Zhang also teaches observing this datum with a camera centric point cloud and using it to convert the camera centric viewpoint to the robot centric viewpoint (see at least Zhang [0026-0030]). Additionally, the point cloud information acquired by the camera is position information in relation to the camera centric viewpoint. Therefore, the position of recognized robot/datum from the camera centric point cloud is essentially a transformation between a reference frame of the image and a reference frame of the robot. Accordingly, Zhang is maintained as prior art for the claim limitation.
Upon further consideration, the Examiner has recognized that Zhang also teaches “determining a transformation between 

Applicant's arguments, see page 11, with respect to Zhang not teaching “the determined transformation between the reference frame of the image and the reference frame of the robot enabling conversion from locations in the image to locations in relation to the robot” have been fully considered but they are not persuasive. Applicant states that Zhang at best “discloses determining how the camera itself, and not the locations within the images, are related to the robot, and does not disclose converting locations in an image to locations in relation to a robot.” Examiner disagrees, according to Zhang [0026-0030], the camera produces a camera centric point cloud and objects are recognized in the point cloud and the coordinate information of these objects from the camera perspective are transformed to the robot perspective as to make the robot aware of the locations of the objects in relation to itself. Accordingly, Zhang teaches the amended subject matter of claims 1, 19, and 20 and the 103 rejections are maintained.

Applicant's argument, see page 13-14, with respect to Claim 21 being novel and non-obvious is unpersuasive. Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), and Kuffner (US 20160016315 A1). See full rejection provided below.
Claim Objections
Upon further consideration, claims 1, 19, and 20 are objected to because of the following informalities:  
Claims 1, 19, and 20 recite both ‘a reference frame of an image’ and ‘a reference frame of the camera.’ The specification recites “the reference frame of the image refers to the spatial location and orientation of the camera image sensor” [0051] which leads the examiner to interpret that the reference frame of the image and the reference frame of the camera are the same thing.  For clarity, the examiner suggests amending the claim language so as to consistently refer to the particular reference frame as only one of either the image reference frame or the camera reference frame. If these reference frames are supposed to be interpreted to be different from each other, then an amendment should be made to differentiate the reference frames.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3-5, 12, 14-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), and Sampedro (US 10131053 B1).
Regarding claims 1 and 19-20:
	Liu teaches: 
A system for controlling a robot for object avoidance, the system comprising:  a processor; and a memory with computer code instructions stored thereon, the processor and the memory, with the computer code instructions, being configured to cause the system to:; (“In one embodiment, the controller 100 comprises a processor 120 and a memory 110. The processor 120 processes data signals and may comprise various computing architectures such as a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets.” [0037])
A non-transitory computer program product for controlling a robot for object avoidance, the computer program product comprising a computer-readable medium with computer code instructions stored thereon, the computer code instructions being configured, when executed by a processor, to cause an apparatus associated with the processor to:; (“The embodiments can also be in a computer program product which can be executed on a computing system.” [0117])
A method for controlling a robot for object avoidance; (“A method, non-transitory computer-readable storage medium, and controller device controls an autonomous entity.” [0009])
detecting … a location of an object based on a camera image of the object,; (“Here, the measurements 263 could include sensor data such as radar, LiDAR, images, video, or other sensed data that can be used to predict a state of the other entities that may include, for example, position, velocity, acceleration, orientation, or other configuration of the other entities.” [0043])
predicting motion of the object based on at least one of: (“The motion predictor 204 applies the motion model 257 to the states of the other entities to predict a future trajectory of the other entities.” [0043])
the detected location of the object, and a model of object motion; (“the measurements 259 may be taken from various sensors that sense, for example, a position, velocity, acceleration, orientation, or configuration of the autonomous entity.” [0043])
and generating a motion plan for a robot that avoids having the robot collide with the object by: … and generating the motion plan based on: (i) the predicted motion of the object, (“The safety controller 210 generates a safety control signal 211 based on the baseline control signal 225, the predicted motion 205 of the other entities, and the estimated state 253 of the autonomous entity. Here, the safety control signal 211 represents a modification to apply to the baseline trajectory specified in the baseline control signal 225 in order to ensure that the autonomous entity follows a safe trajectory to achieve the goal 251 that avoids collisions with the other entities.” [0045])
Liu does not explicitly teach 
detecting a type;
the image having a reference frame;
determining a transformation between the reference frame of the image and a reference frame of the robot, the determined transformation between the reference frame of the image and the reference frame of the robot enabling conversion from representing translation of locations in the image to locations in relation to the robot;
determining a transformation between a reference frame of the object and a reference frame of the camera;
generating the motion plan based on: (ii) the determined transformation between the reference frame of the image and the reference frame of the robot, (iii) the determined transformation between the reference frame of the object and the reference frame of the camera, and (iv) dynamic limits of the robot.
	However, Zhang, does explicitly teach:
detecting a type; (“The object recognition used in the instant application can be applied directly to the point cloud data to determine the datum, or in some forms the point cloud data can be transformed and/or operated upon to form another data type prior to performing object recognition. For example, information about the CAD model, or information derived from the CAD model, can used to compare to the point cloud data generated from the camera” [0028]; “The point cloud is used in later processing to determine relative positioning of objects in the scene, and to identify features of the scene through object recognition.” [0015])
the image having a reference frame (“To convert coordinate information of the point cloud captured by the camera (and thus expressed relative to a camera centric viewpoint).” [0026])
determining a transformation between the reference frame of the image and a reference frame of the robot, the determined transformation between the reference frame of the image and the reference frame of the robot enabling conversion from representing translation of locations in the image to locations in relation to the robot; (“the position of objects in the workspace can be translated into robot space given information about the datum relative to the robot 54. … Coordinate information from the camera perspective thus can be readily transformed to the robot perspective once the object recognition is completed. In one nonlimiting embodiment coordinate transformations will be understood to include transformations that involve rotation and/or translation from one origin and reference frame to another origin and reference frame. In some instances the origins and/or reference frame orientations can be similar, but in most instances will both be different. In sum, coordinates in the point cloud expressed relative to a camera centric viewpoint can be transformed to coordinates in the point cloud expressed relative to a robot centric viewpoint. Such transformation provides the robot with a situational awareness of the workspace object so that the robot can elect to interact or avoid the object.” [0029-0030]; “To convert coordinate information of the point cloud captured by the camera (and thus expressed relative to a camera centric viewpoint) to coordinate information expressed relative to the robot, object recognition can be applied to the point cloud (or data derived from the point cloud), and specifically to a datum in the point cloud, where information about the datum is known relative to a frame of reference of the robot. Object recognition of the datum and subsequent coordinate conversion into a robot centric frame of reference can assist the robot in maintaining its situational awareness, performing navigation, and/or identifying potential movements that are free of obstruction, etc.” [0026]; “In one form the datum is the robot, or a portion of the robot, whether that portion includes one or more components. For example, a link 76 in a hand of the robot can be used as the datum so long as the object recognition techniques employed are capable of identifying the datum. Other examples of a datum include a tool used by the robot, or a base plate of the robot, to set forth just a few nonlimiting examples.” [0027]; Examiner Interpretation: Regarding the citations of [0029-0030], the transformation between the camera centric and robot centric viewpoints involve determining a transformation between the reference frame of the image and a reference frame of the robot. Regarding the citations of [0026-0027], the datum is interpreted to be the robot ([0026]) and the datum is recognized by the point cloud information (displacement of points in space) in the camera centric viewpoint so as to perform coordinate conversion and therefore is fundamentally determining a transformation between the reference frame of the image and a reference frame of the robot.)
determining a transformation between (“Until object recognition is performed, data from camera image provides point cloud information in camera space. Once the object is recognized, the position of objects in the workspace can be translated into robot space given information about the datum relative to the robot 54.” [0029]; Examiner Interpretation: The point cloud information is position information detected from the camera centric viewpoint. Therefore, before the translation into robot space is performed, the position of recognized objects from the camera centric point cloud is essentially a transformation between a reference frame of the object and a reference frame of the camera.);
and generating the motion plan based on: … (ii) the determined transformation between the reference frame of the image and the reference frame of the robot, (iii) the determined transformation between (“Object recognition of the datum and subsequent coordinate conversion into a robot centric frame of reference can assist the robot in maintaining its situational awareness, performing navigation, and/or identifying potential movements that are free of obstruction, etc.” [0026]; “Until object recognition is performed, data from camera image provides point cloud information in camera space. Once the object is recognized, the position of objects in the workspace can be translated into robot space given information about the datum relative to the robot 54. For example, if the datum is a link on the robot, the position of the link relative to the robot will be known, and subsequent coordinate transformations of the point cloud data will also be known relative to the robot frame of reference so that position information can be provided to the robot about the objects 52, where such information is used to permit the robot to navigate and/or manipulate its surroundings.” [0029]; “Coordinate information from the camera perspective thus can be readily transformed to the robot perspective once the object recognition is completed. … Such transformation provides the robot with a situational awareness of the workspace object so that the robot can elect to interact or avoid the object.” [0030])
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified Liu to include the teachings of Zhang to be able to “assist the robot in maintaining its situational awareness, performing navigation, and/or identifying potential movements that are free of obstruction, etc” [0026]. And the use of frame transformations allow for more freedom of camera mounting locations on or off of the robot system. “In these embodiments the camera 58 can be located and mounted in such a fashion to permit viewing of the robot 54, or a portion of the robot 54, but such limitation may not be required given the description below of a datum useful in determining relative position information in the workspace” [0016].

Zhang also teaches does not explicitly teach
the reference frame of the object
generating the motion plan based on: … (iv) dynamic limits of the robot.
However, Feddema, does explicitly teach:
determining a transformation between a reference frame of the object and a reference frame of the camera; and generating the motion plan based on: … (iii) the determined transformation between the reference frame of the object and the reference frame of the camera (“The position and orientation between the robot's end-effector, the cameras, and the part may be represented by 4 x 4 homogeneous transformations.” Pg. 2, col. 3, lines 38-41; Fig. 1 on pg. 2 shows the camera coordinate frames and workpiece (object) coordinate frame.; Fig. 2 on page 3 shows a flowchart that includes the determination of the transformation between the workpiece and the camera where it says “estimated workpiece pose with respect to camera B.” in a process that controls a robot.; “Then the homogenous transformation of the workpiece with respect to the camera is cTw = cT1(wT1)-1.” See at least Pg. 4, col. 2, lines 25-28.)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Feddema to improve the flexibility of robot control and reduce downtime for misalignments with objects so that the robot can easily operate when objects are in unpredictable positions. See at least page 1, lines 1-25.
Feddema also teaches does not explicitly teach
generating the motion plan based on: … (iv) dynamic limits of the robot.
However, Sampedro, does explicitly teach:
generating the motion plan based on: … (iv) dynamic limits of the robot (“In some implementations, collision avoidance trajectories may take the form of safe stopping trajectories achievable by actuators of the robot. In some of those implementations, safe stopping trajectories are trajectories that can be achieved without requiring active braking of actuator(s) and/or without violating constraint(s) of the actuators.” Col. 2, lines 6-12; “In some implementations, the engine 520 may determine collision avoidance trajectories based on a current configuration of the actuators (e.g., as determined based on the current motion state 584), based on a mapping of obstacles in configuration space and/or other data related to obstacles provided by obstacle detection engine 525, based on one or more kinematic constraints 582 associated with one or more robot actuators, and/or based on a target configuration of the actuators based on the target motion state 580.” Col. 20, lines 3-12)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified Liu, Zhang, and Feddema to include the teachings of Sampedro to be able to avoid collisions without damaging the robot. “As used herein, "any collision avoidance trajectory" may refer to any trajectory that may be implemented by any robot actuator to avoid a collision without potentially damaging the robot (e.g., without hard braking or performing some other sudden movement/rotation that violates kinematic constraints of the robot).” Pg. 22, col. 22, lines 60-66
Regarding claim 3:
	Liu further teaches:
The method of Claim 1 wherein the object is at least one of: a human; and another robot.; (“Similarly, while the specification described details in the context of operating an autonomous entity in the presence of humans (in the robot arm case) or human driven vehicles (in the autonomous vehicle case), the described methods, apparatuses, and principles described with respect to either or both of these embodiments may also apply to the autonomous entity avoiding other intelligent agents (e.g., animals), human-controlled vehicles or objects, or other autonomously controlled vehicles, objects, or robots.” [0036])
Regarding claim 4:
	Liu further teaches: 
The method of Claim 1 wherein the detected location of the object is a six degree-of- freedom (6DOF) pose, including an x-position, y-position, z-position, roll, pitch, and yaw of the object,; (“For example, the measurements 259 may be taken from various sensors that sense, for example, a position, velocity, acceleration, orientation, or configuration of the autonomous entity.” [0043])
Liu does not explicitly teach reference frames of the image.
Zhang does explicitly teach:
represented in the reference frame of the image; (“Coordinate information from the camera perspective thus can be readily transformed to the robot perspective once the object recognition is completed. In one nonlimiting embodiment coordinate transformations will be understood to include transformations that involve rotation and/or translation from one origin and reference frame to another origin and reference frame.” [0030] Examiner’s interpretation: If coordinate info from camera can be translated and rotated. A detected 6DOF pose can be represented in the reference frame of the image.)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Zhang to be able to tell that “object recognition of the datum and subsequent coordinate conversion into a robot centric frame of reference can assist the robot in maintaining its situational awareness, performing navigation, and/or identifying potential movements that are free of obstruction, etc.” [0026]
Regarding claim 5:
	Liu further teaches:
The method of Claim 4 wherein predicting motion of the object is further based on the 6DOF pose of the object as represented in the reference frame of the image; (“Here, the measurements … can be used to predict a state of the other entities that may include, for example, position, velocity, acceleration, orientation, or other configuration of the other entities. The motion predictor 204 applies the motion model 257 to the states of the other entities to predict a future trajectory of the other entities.” [0043])
Regarding claim 12:
	Liu further teaches:
The method of Claim 1 wherein generating the motion plan for the robot further includes at least one of: in real-time, re-planning a previously planned motion plan for the robot (“controller controls motion of an autonomous entity. The controller determines a baseline trajectory for the autonomous entity that achieves a particular goal (e.g., moves the autonomous entity to a particular location). A safety controller then modifies the baseline trajectory to avoid other entities” [Abstract]; “The modularized architecture beneficially 1) treats the efficiency goal and the safety goal separately and allows more freedom in designing robot behaviors, 2) is compatible with existing robot motion control algorithms and can deal with complicated robot dynamics, 3) guarantees real time safety, and 4) is good for parallel computation.” [0034]; Explanation: The efficiency goal is first planned independently. Then the safety goal (object avoidance plan) modifies (or re-plans) the efficiency goal (original path). This is done in real time.)
Liu does not explicitly teach neural networks.
However, Sampedro does explicitly teach:
using a neural network configured to generate a motion plan using the predicted motion of the object as an input.; (“In some implementations, the trajectories may be generated based on output of a neural network model” [0013] Trajectories refer to collision avoidance trajectories in this paragraph; “In some implementations, the engine 520 may determine collision avoidance trajectories based on a current configuration of the actuators (e.g., as determined based on the current motion state 584), based on a mapping of obstacles in configuration space and/or other data related to obstacles provided by obstacle detection engine 525, based on one or more kinematic constraints 582 associated with one or more robot actuators, and/or based on a target configuration of the actuators based on the target motion state 580.” Pg. 21 lines 3-12.; “In those implementations, the engine 520 may have access to updated obstacle information to enable one or more new collision avoidance trajectories to be determined in response to dynamic objects.” Pg. 22, col. 22, lines 11-14)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Sampedro since neural networks can be further trained during operation of the robot and therefor improved in avoiding obstacles in the robot’s particular environment. [0006]
Regarding claim 14:
	Liu further teaches:
The method of Claim 1 wherein the predicted motion of the object is zero motion.; (“FIG. 16 shows the case when the automated vehicle suddenly noticed a stationary obstacle 40m ahead.” [0111])
Regarding claim 15:
	Liu further teaches:
The method of Claim 1 wherein the generated motion plan for the robot includes stopping until the object is removed from an environment of the robot.; (“Conservative strategies such as “braking when collision is anticipated,” known as the Automatic Emergency Braking (AEB) function in existing models, are not the best actions in most cases (although they may be necessary in certain cases). Taking into account the dynamics and future course of surrounding vehicles, the automated vehicle should decide between multiple choices for a safe maneuver.” [0007])
Regarding claim 16:
	Liu further teaches:
the predicted motion of the object. (“The motion predictor 204 applies the motion model 257 to the states of the other entities to predict a future trajectory of the other entities.” [0043])
	Liu does not explicitly teach:
wherein the dynamic limits of the robot are a function of …
the detected type of the object and …
	However, Zhang does explicitly teach:
detecting a type; (“The object recognition used in the instant application can be applied directly to the point cloud data to determine the datum, or in some forms the point cloud data can be transformed and/or operated upon to form another data type prior to performing object recognition. For example, information about the CAD model, or information derived from the CAD model, can used to compare to the point cloud data generated from the camera” [0028]; “The point cloud is used in later processing to determine relative positioning of objects in the scene, and to identify features of the scene through object recognition.” [0015])
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Zhang to be able to tell “object recognition of the datum and subsequent coordinate conversion into a robot centric frame of reference can assist the robot in maintaining its situational awareness, performing navigation, and/or identifying potential movements that are free of obstruction, etc” [0026]. 
Zhang also does not explicitly teach:
wherein the dynamic limits of the robot are a function of …
However, Sampedro does explicitly teach:
Wherein the dynamic limits of the robot are a function of the detected type of the object and the predicted motion of the object; (“In some implementations, collision avoidance trajectories may take the form of safe stopping trajectories achievable by actuators of the robot. In some of those implementations, safe stopping trajectories are trajectories that can be achieved without requiring active braking of actuator(s) and/or without violating constraint(s) of the actuators.” Col. 2, lines 6-12; “In some implementations, the engine 520 may determine collision avoidance trajectories based on a current configuration of the actuators (e.g., as determined based on the current motion state 584), based on a mapping of obstacles in configuration space and/or other data related to obstacles provided by obstacle detection engine 525, based on one or more kinematic constraints 582 associated with one or more robot actuators, and/or based on a target configuration of the actuators based on the target motion state 580.” Col. 20, lines 3-12)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Sampedro to be able to avoid collisions without damaging the robot. “As used herein, "any collision avoidance trajectory" may refer to any trajectory that may be implemented by any robot actuator to avoid a collision without potentially damaging the robot (e.g., without hard braking or performing some other sudden movement/rotation that violates kinematic constraints of the robot).” Pg. 22, col. 22, lines 60-66
Claims 6-7, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), and Wong (IDS: NPL).
Regarding claim 6:
	Liu does not explicitly teach
wherein detecting the location of the object includes: processing the image of the object through a convolutional neural network to predict one or more parts of the object forming a two-dimensional (2D) position of the object in the image; and determining the 6DOF pose using the 2D position of the object in the image, pixels of the object, and a depth map corresponding to the image of the object.;
	However, Wong further teaches:
wherein detecting the location of the object includes: processing the image of the object through a convolutional neural network to predict one or more parts of the object forming a two-dimensional (2D) position of the object in the image; and determining the 6DOF pose using the 2D position of the object in the image, pixels of the object, and a depth map corresponding to the image of the object.; (Examiner Summary: SegICP uses a CNN to predict the object and 6DOF pose. It passes RGB frames and depth maps through the CNN. (RGB frames are 2D). Pgs. 2-3, III. Technical Approach, Paragraphs 1 and 2)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Wong to be able to achieve accurate and real-time 6-DOF pose estimation for relevant objects. “SegICP couples convolutional neural networks and multi-hypothesis point cloud registration to achieve both robust pixel-wise semantic segmen­tation as well as accurate and real-time 6-DOF pose estimation for relevant objects. Our architecture achieves 1 cm position error and < 5° angle error in real time” [Abstract].
Regarding claim 7:
	Liu does not explicitly teach
wherein determining the 6DOF pose using the 2D position of the object, the depth map corresponding to the image of the object, and the pixels of the object includes: fitting the depth map to a candidate three-dimensional (3D) model of the object, where dimensions of the 3D model match dimensions of the object.
	However, Wong further teaches:
wherein determining the 6DOF pose using the 2D position of the object, the depth map corresponding to the image of the object, and the pixels of the object includes: fitting the depth map to a candidate three-dimensional (3D) model of the object, where dimensions of the 3D model match dimensions of the object.; (“SegICP achieves this using an object library approach to perception, referencing scanned 3D models of known objects, and performs 3D point cloud matching against cropped versions of these mesh models. In our architecture, as outlined in Figure 2, RGB frames are first passed through a CNN which outputs a segmented mask with pixel-wise semantic object labels.” Pg. 3, III. Technical Approach)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Wong to robustly deal with prolonged occlusions in image processing. “SegICP acquires and tracks the 6-DOF pose of each detected object, operating at ~70ms per frame (270ms during initialization phase) with 1 cm position error and < 5° angle error, and can robustly deal with prolonged occlusions and potential outliers in the segmentation with a Kalman filter.” Page 3, III. Technical Approach.
Regarding claim 11:
	Liu does not explicitly teach
wherein detecting the type of the object comprises at least one of: processing the image of the object using a neural network classifier configured to predict object type based on one or more images of an object
	However, Wong further teaches:
wherein detecting the type of the object comprises at least one of: processing the image of the object using a neural network classifier configured to predict object type based on one or more images of an object; (“SegICP which uses a neural network predicts objects that are relevant to the task based on image(s).” [Fig. 2])
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Wong because “recent work in computer vision has shown that convolutional neural networks (CNNs) considerably improve image classification.” Pg. 2, Object Recognition Paragraph.
Regarding claim 18:
	Liu does not explicitly teach
wherein the image of the object is at least one of: a depth image of the object; and a red, green, blue (RGB) image of the object.;
	However, Wong further teaches:
wherein the image of the object is at least one of: a depth image of the object; and a red, green, blue (RGB) image of the object.; (Examiner Summary: SegICP uses RGB-D sensor which captures both RGB and depth images. [Fig. 1]) 
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Wong because RGB and depth images can be used to determine the object and pose of object. Pg. 2, Pose Estimation Paragraph.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), Wong (IDS: NPL), and Djuric (US 20190049970 A1).
Regarding claim 8:
	Liu does not explicitly teach
wherein detecting the location of the object comprises: processing the image of the object using a neural network to directly determine a six degree-of-freedom pose of the object in the reference frame of the image where, processing the image of the object using the neural network includes at least one of:;
determining pixels in the image belonging to the object and determining the location of the object by matching the determined pixels to a silhouette
	However, Wong further teaches: 
wherein detecting the location of the object comprises: processing the image of the object using a neural network to directly determine a six degree-of-freedom pose of the object in the reference frame of the image where, processing the image of the object using the neural network includes at least one of:; (Examiner summary: SegICP uses a CNN to predict the 6DOF pose. It passes RGB frames and depth maps through the CNN. Pgs. 2-3, III. Technical Approach, Paragraphs 1 and 2.)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Wong to be able to achieve accurate and real-time 6-DOF pose estimation for relevant objects. “SegICP couples convolutional neural networks and multi-hypothesis point cloud registration to achieve both robust pixel-wise semantic segmen­tation as well as accurate and real-time 6-DOF pose estimation for relevant objects. Our architecture achieves 1 cm position error and < 5° angle error in real time” [Abstract].
Wong also does not explicitly teach
determining pixels in the image belonging to the object and determining the location of the object by matching the determined pixels to a silhouette
However, Djuric further teaches:
determining pixels in the image belonging to the object and determining the location of the object by matching the determined pixels to a silhouette; (“Each image can be, for example, a rasterized image where each pixel corresponds to a spatial area or volume (e.g., certain coordinates, etc.)”… “In some implementations, the plurality of channels of an image can include a plurality of color channels. Each color channel can be associated with a different color and each color channel can be encoded with a different feature. For example, a sidewalk can be represented by an orange color, while a roadway can be represented by a grey color.” [0029]. Examiner interpretation: Obstacles in the environment are detected by taking an image and matching the pixels to the particular area, which is the object, involving a colored silhouette of the area/object.)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Djuric because images where each pixel corresponds to a spatial area or volume (e.g., certain coordinates, etc.) “can allow for more focused and efficient processing of the images.” [0058]
Claims 9 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), and Djuric (US 20190049970 A1).
Regarding claim 9:
	Liu further teaches: 
The method of Claim 1 wherein predicting motion of the object based on at least one of the detected type of the object, the detected location of the object, and the model of object motion comprises at least one of: (“The motion predictor 204 applies the motion model 257 to the states of the other entities to predict a future trajectory of the other entities” … “the measurements 259 may be taken from various sensors that sense, for example, a position, velocity, acceleration, orientation, or configuration of the autonomous entity.” [0043])
Liu does not explicitly teach neural networks.
Djuric does explicitly teach:
processing the image of the object using a neural network trained to estimate future position of the object based on at least one image of the object having a reference frame; processing the image of the object using a neural network trained to determine motion of the object based on a successive set of images; (Examiner summary: An image or plurality of images is inputted into a machine-learned model (can be neural network) to predict one or more future locations of the object based on state data. State data can be estimated location, velocity, acceleration, heading, orientation, size, and/or class. Using geographic data, the object is known relative to the geographic area. This has been interpreted that the object has a reference frame because the position of the object is known with respect to the geographic area. [0021], [0025], and [0027]; “the combined data set can include image data associated with a plurality of times.” [0021])
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Djuric because machine-learned models improve the ability to predict future object locations with more accuracy and precision than existing computers. [0036]
Regarding claim 13:
	Liu further teaches:
The method of Claim 1 wherein predicting motion of the object further comprises predicting collision of the robot and the object using at least one of:; (Examiner Summary: Safety controller modifies the baseline trajectory to avoid other entities based on a prediction of their motion. [Abstract])
Liu does not explicitly teach neural networks.
Djuric does explicitly teach:
a neural network; (Examiner Summary: Object trajectory is predicted with a neural network for avoidance. In an example, the path of a pedestrian is predicted to veer off the sidewalk because of an obstruction. Pgs. 13-14, [0033]. Examiner explanation: In this example, the neural network first predicts motion of the pedestrian and a collision between the pedestrian and obstruction in order to adjust the path of the pedestrian motion model.)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Djuric to tell that machine-learned models improve the ability to predict future object locations with more accuracy and precision than existing computers which enable improved motion planning. Pg. 14 [0036]
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), Djuric (US 20190049970 A1), and Millard (US11016491 B1).
Regarding claim 10:
	Liu does not explicitly teach
further including: selecting equations to include in the physics-based dynamics model of the object from a set of plausible models by matching the detected type of the object to an element in a library of models.;
	However, Millard further teaches:
further including: selecting equations to include in the physics-based dynamics model of the object from a set of plausible models by matching the detected type of the object to an element in a library of models.; (Examiner Summary: A motion model is selected from a library based on the predicted type of object. Pg. 13, Lines 22-30; “Motion models for dynamic obstacles may specify travel characteristics for the obstacle, such as expected velocities, accelerations, trajectories, directional changes, or a combination of these and/or other characteristics.” Lines 45-48)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Millard to know that a library of motion models that exists on some storage device can hold models of objects that are typical to the environment the robot is operating in. This allows for motion to be more easily predicted with less processing by simply having to match the detected type of object to the stored motion model that corresponds to that object type.  Pg.13, Col. 10, lines 22-48.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), and Millard (US11016491 B1).
Regarding claim 17:
	Liu does not explicitly teach
further comprising: selecting the model of object motion based on the detected type of the object
	However, Millard further teaches:
further comprising: selecting the model of object motion based on the detected type of the object; (“The motion models may be pre-defined based on historical analysis of the movements of different types (e.g., classes) or instances of obstacles” … “To select appropriate motion models from the library 224, the obstacle location prediction engine 218 can include an object recognizer 220 that recognizes obstacles represented in occupancy maps 216.” Pg. 14)
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Millard to know that a library of motion models based on object class can be used with a predictive model to generate obstacle location predictions. Pg. 14, col. 11, lines 13-17

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Liu (WO 2017/197170 A1) in view of Zhang (US 20180275632 A1), Feddema (NPL: Model-Based Visual Feedback Control for a Hand-Eye Coordinated Robotic System), Sampedro (US 10131053 B1), and Kuffner (US 20160016315 A1).
Regarding Claim 21,
	Liu teaches modeling a human with geometric ‘capsules’ ([0067]), predicting the human’s time varying motion [0068], and modifying a motion plan based off the predicted motion ([0045]).
	Liu does not explicitly teach modeling the geometric ‘capsules’ in a time varying manner or basing it off of the predicted motion. Therefore, Liu does not explicitly teach
determining a time varying model of the object based on the predicted motion of the object; wherein the motion plan is further generated based on the determined time varying model of the object.
	However, Kuffner further teaches:
determining a time varying model of the object based on the predicted motion of the object; (“Given a predicted trajectory of the object/human, the robotic device may compare that trajectory with its own estimated trajectory and determine to what degree the path and volume of space that the robotic device will occupy within the following predetermined time period will intersect with the path and volume of space that the object/human will occupy within that same predetermined time period.” [0021]; Examiner Interpretation: The determined volume of  space that the object/human will occupy within a predetermined time period according to its predicted trajectory is the time varying model of the object.) 
wherein the motion plan is further generated based on the determined time varying model of the object. (“The robotic device can then adjust (e.g., expand or shrink) its virtual safety cage based on potential intersection between the two paths/volumes (or potential lack thereof).” [0021]; “FIG. 6 illustrates example operations of the example method being performed, in accordance with at least some embodiments described herein. In this example, a computing device (not shown) may determine that, in an upcoming predetermined period of time (e.g., one second), a robotic arm 600 of a robotic device may move along a trajectory 602 that will intersect with another trajectory 604 of a moving vehicle 606. At or before making this determination, one or more dimensions of the virtual safety cage 608 of the robotic arm 600 may be bounded in an ellipsoidal manner, as shown. However, upon making this determination, the computing device may adjust the trajectory of the robotic arm 600, where the adjusted trajectory 610 is such that the robotic arm 600 will come to an emergency stop at a location that does not intersect with the trajectory 604 of the moving vehicle 606. The computing device may also adjust the virtual safety cage accordingly, where the adjusted virtual safety cage 612 is bounded spherically in one or more dimensions, as shown. Furthermore, in some scenarios, the computing device may also determine a virtual safety cage 614 for the moving vehicle 606 and may determine the adjusted virtual safety cage 612 such that the adjusted virtual safety cage does not intersect with the moving vehicle's virtual safety cage.” [0099])
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Liu to include the teachings of Kuffner so that robots and humans “can avoid interference with one another or otherwise make efficient use of their shared physical environment” [0023]. Also see at least [0029].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cristiano (US 20160291571 A1) is pertinent because it discusses transforming a camera reference frame to a robot tool reference frame.
Azizian (US 20140163736 A1) is pertinent because it discusses determining the configuration and position of a second robot arm relative to a reference frame of the manipulatable device by using a transform to translate points in the reference frame of an image capturing device to the reference frame of the manipulatable device.
Dedkov (US 20200073358 A1) is pertinent because it discusses a similar invention using cameras and reference frame transformations to ensure a robot doesn’t collide with a human.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Karston G Evans whose telephone number is (571)272-8480.  The examiner can normally be reached on Mon-Fri 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abby Lin can be reached on (571)270-3976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/K.G.E./Examiner, Art Unit 3664                                                                                                                                                                                                        /ABBY Y LIN/Supervisory Patent Examiner, Art Unit 3664