DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/19/2022 has been entered.
Response to Amendment
Applicant's amendments and remarks submitted 1/19/2022 have been entered and considered, but are not found convincing. Claims 1, 9,17 have been amended. In summary, claims 1-20 are pending in the application. 
Response to Arguments
 Claim Rejection -35 U.S.C 103
Applicant’s arguments with respect to independent claim have been considered but are moot because the rejection has been modified to address the newly added limitations.  The examiner now relies on Wright or Doris for argued limitation.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	Claims 1, 8-9, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Wright et al, U.S Patent Application Publication No. 20200394012 (“Wright”)
Regarding independent claim 1, Gauglitz teaches a remote assistance system (Fig.2), comprising: 
a wearable visual enhancement device at a first location (¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display” where a head –worn is considered as a wearable visual enhancement device) configured to: 
scan a scene in a real world in a forward field-of-view of a first user (¶0031 “FIG. 1 shows an example live Augmented Reality-based remote collaboration system 100, according to an embodiment. FIG. 1 shows the user in physical location A in front of a car engine, identifying a particular element, which the user in physical location B has marked with the yellow dot.”; ¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”),
generate sensor data associated with one or more objects in the scene, and transmit the sensor data (¶0043 “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system”);  and 

    PNG
    media_image1.png
    492
    374
    media_image1.png
    Greyscale

Fig. 2 of Gauglitz
a computing system at a second location (¶0041 “FIG. 2 shows an overview of the system architecture 200, according to an embodiment. System architecture 200 shows both the local user's system on top and the remote user's system on bottom. In an example, the local user's system may be running on an Android-based lightweight tablet or smartphone, and the remote user's system may be running on a commodity PC with Ubuntu”) configured to: 
receive the sensor data (¶0047 “The system consists of five main modules--network module, 3D modeler, camera control, annotation control, renderer--and the framework to hold them together”; ¶0049 “The network module receives the data stream , 
generate a 3D scene including 3D models of the one or more objects (¶0050 “A 3D surface model is constructed on the fly from the live video stream and from associated camera poses. Keyframes were selected based on a set of heuristics (good tracking quality, low device movement, minimum time interval & translational distance between keyframes), then detect and describe features in the new frame using SIFT. Four closest existing keyframes were chosen and matched against their features (one frame at a time) via an approximate nearest neighbor algorithm and collect matches that satisfy the epipolar constraint (which is known due to the received camera poses) within some tolerance as tentative 3D points. If a feature has previously been matched to features from other frames, we check for mutual epipolar consistency of all observations and merge them into a single 3D point if possible; otherwise, the two 3D points remain as competing hypotheses”; ¶0068 “The renderer renders the scene using the 3D model, the continually updated keyframes, the incoming live camera frame (including live camera pose), the virtual camera pose, and the annotations”; ¶0076 “ Currently, the 3D model is available only on the remote user's side” where 3D model is available only on remote user’ side and Fig. 2 shows the system consists of five main modules--network module, 3D modeler, camera control, annotation control, renderer is same side with remote user’s side.), 
receive, via input by a second user, a mark associated with one of the 3D models (¶0066 “The remote user sets a marker by simply left-clicking into the view , and 
transmit only information that identifies the mark to the wearable visual enhancement device, wherein the wearable visual enhancement device is further configured to: display the mark adjacent to the object corresponding to the one of the 3D models (¶0064 “In addition to being able to control the viewpoint, the remote user can set and remove virtual annotations. Annotations are saved in 3D world coordinates, are shared with the local user's mobile device via the network, and immediately appear in all views of the world correctly anchored to their 3D world position (cf. FIGS. 1 and 3).”) Gauglitz is understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor, Wright teaches receive, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information, and modify the displayed mark in accordance with the received edit signal (¶0168 “…Alternatively, or additionally, the user may provide feedback as steps are completed to indicate to the system or user device that an annotation may be removed ¶0256 “The system might be able to take a verbal command, verbal statement from the user or from the helpee and remove annotations or something based off that so open this panel, okay, open the panel and then maybe the system recognizes that and removes the annotation” where the user provide feedback or helpee using verbal command to indicate remove the annotations which is considered as receive an edit signal to modify the displayed mark)

 Thus, the combination of Gauglitz and Wright teaches a remote assistance system, comprising: a wearable visual enhancement device at a first location configured to: scan a scene in a real world in a forward field-of-view of a first user, generate sensor data associated with one or more objects in the scene, and transmit the sensor data; and a computing system at a second location configured to: receive the sensor data, generate a 3D scene including 3D models of the one or more objects, receive, via input by a second user, a mark associated with one of the 3D models, and transmit only information that identifies the mark to the wearable visual enhancement device, wherein the wearable visual enhancement device is further configured to: display the mark adjacent to the object corresponding to the one of the 3D models, receive, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information, and modify the displayed mark in accordance with the received edit signal.
Regarding claim 8, Gauglitz and Wright teach the remote assistance system of claim 1, wherein the computing system is further configured to adjust a virtual perception of the second user in the 3D scene in response to users inputs from the second user (¶0060-0061 of Gauglitz “ The user can also zoom into and out of the view with the scroll wheel. Zooming is implemented as a change of the virtual camera's field of view (rather than dollying) to avoid having to deal with corrections for parallax or occlusions from objects behind the original camera position.  [0061] The present subject matter provides click to change viewpoint capabilities. When the user right-clicks into the view, we compute the 3D hit point, and subsequently find the camera whose optical axis is closest to this point (which may be the current camera as well). This camera is transitioned and yaw and pitch adapted such that the new view centers on the clicked-upon point. This allows the user to quickly center on a nearby point as well as quickly travel to a faraway point with a single click.”) 
Regarding independent claim 9, Gauglitz teaches a method for remote assistance, comprising: 
scanning, by a wearable visual enhancement device at a first location, a scene in a real world in a forward field-of-view of a first user (¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”);
 generating, by the wearable visual enhancement device, sensor data associated with one or more objects in the scene (¶0043 “Under the hood, the ; 
generating, by a computing system at a second location, a 3D scene including 3D models of the one or more objects (¶0050 “A 3D surface model is constructed on the fly from the live video stream and from associated camera poses. Keyframes were selected based on a set of heuristics (good tracking quality, low device movement, minimum time interval & translational distance between keyframes), then detect and describe features in the new frame using SIFT. Four closest existing keyframes were chosen and matched against their features (one frame at a time) via an approximate nearest neighbor algorithm and collect matches that satisfy the epipolar constraint (which is known due to the received camera poses) within some tolerance as tentative 3D points. If a feature has previously been matched to features from other frames, we check for mutual epipolar consistency of all observations and merge them into a single 3D point if possible; otherwise, the two 3D points remain as competing hypotheses”; ¶0068 “The renderer renders the scene using the 3D model, the continually updated keyframes, the incoming live camera frame (including live camera pose), the virtual camera pose, and the annotations”; ¶0076 “ Currently, the 3D model is available only on the remote user's side” where 3D model is available only on remote user’ side and Fig. 2 shows the system consists of five main modules--network module, 3D modeler, camera control, annotation control, renderer is same side with remote user’s side.)”; 
receiving, via input to the computing system by a second user, a mark associated with one of the 3D models(¶0066 “The remote user sets a marker by ; 
transmitting, by the computing system, only information that identifies the mark to the wearable visual enhancement device; and displaying, by the wearable visual enhancement device, the mark adjacent to the object corresponding to the one of the 3D models (¶0064 “In addition to being able to control the viewpoint, the remote user can set and remove virtual annotations. Annotations are saved in 3D world coordinates, are shared with the local user's mobile device via the network, and immediately appear in all views of the world correctly anchored to their 3D world position (cf. FIGS. 1 and 3).”) Gauglitz is understood to be silent on the remaining limitations of claim 9.
In the same field of endeavor, Wright teaches receiving, by the wearable visual enhancement device, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information; and modifying, by the wearable visual enhancement device, the displayed mark in accordance with the received edit signal (¶0168 “…Alternatively, or additionally, the user may provide feedback as steps are completed to indicate to the system or user device that an annotation may be removed ¶0256 “The system might be able to take a verbal command, verbal statement from the user or from the helpee and remove annotations or something based off that so open this panel, okay, open the panel and then maybe the system recognizes that and removes the annotation” where the user provide feedback or helpee using verbal command to indicate remove the annotations which is 
Thus, the combination of Gauglitz and Wright teaches a method for remote assistance, comprising: scanning, by a wearable visual enhancement device at a first location, a scene in a real world in a forward field-of-view of a first user; generating, by the wearable visual enhancement device, sensor data associated with one or more objects in the scene; generating, by a computing system at a second location, a 3D scene including 3D models of the one or more objects; receiving, via input to the computing system by a second user, a mark associated with one of the 3D models; transmitting, by the computing system, only information that identifies the mark to the wearable visual enhancement device; displaying, by the wearable visual enhancement device, the mark adjacent to the object corresponding to the one of the 3D models, 4Attorney Docket No.: 81023-000035 receiving, by the wearable visual enhancement device, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information; and modifying, by the wearable visual enhancement device, the displayed mark in accordance with the received edit signal.
Regarding claim 16, Gauglitz and Wright teach the method of claim 9, further comprising adjusting, by the computing system, a virtual perception of the second user in the 3D scene in response to users inputs from the second user(¶0060-0061 of Gauglitz “ The user can also zoom into and out of the view with the scroll wheel. Zooming is implemented as a change of the virtual camera's field of view (rather than dollying) to avoid having to deal with corrections for parallax or occlusions  
1.	Claims 2-3, 10-11, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Wright et al, U.S Patent Application Publication No. 20200394012 (“Wright”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”)
Regarding claim 2, Gauglitz and Wright  teach the remote assistance system of claim 1, wherein the wearable visual enhancement device includes  an inertial measurement unit (IMU) configured to collect acceleration of the wearable visual enhancement device (¶0028 of Wright “Computer 210 may include or have access to a computing environment that includes input 216, output 218, and a communication connection 220. The input 216 may include one or more of a touchscreen, touchpad, one or more cameras, mouse, keyboard, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 210, and other input devices. The input 216 may further include inertial measurement unit (IMU), which may include an accelerometer and/or a gyroscope. As discussed below, the computer 210 may use the camera and/or the IMU to determine 
In the same field of endeavor, Smith teaches wherein the wearable visual enhancement device includes a camera configured to collect color information of a color image of the scene, a depth camera configured to collect distance information of a depth image of the scene (col. 4, lines 52-59 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”), and an inertial measurement unit (IMU) configured to collect velocity of the wearable visual enhancement device (col.7, lines 23-28  of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of 
Therefore, in combination of Gauglitz and Wright, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with using the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor at local user’s side of Smith because this modification would provide raw color and depth date of an object at a certain level of discretization size (col. 4, lines 52-59 of Smith) Gauglitz, Wright and Smith are understood to be silent on the remaining limitations of claim 2.
In the same field of endeavor, Naimark teaches an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the device (¶0018 “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 
 Therefore, in combination of Gauglitz, Wright and Smith, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with including the inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers as seen in Naimark because this modification would provide six degrees of freedom-3 translation-related values and 3 rotation-related values and determine the pose of the system relative to the real world objects viewed by the camera (¶0018 of Naimark)
Thus, the combination of Gauglitz, Wright, Smith and Naimark teaches wherein the wearable visual enhancement device includes a camera configured to collect color information of a color image of the scene, a depth camera configured to collect distance information of a depth image of the scene, and an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the wearable visual enhancement device.
Regarding claim 3,  Gauglitz, Wright, Smith and Naimark teach the remote assistance system of claim 2, wherein the wearable visual enhancement device includes a tracker configured to generate degree of freedom (DoF) information at least partially based on the acceleration and angular velocity (col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed he inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” ) In addition, the same motivation is used as the rejection for claim 2.
Regarding claim 10, Gauglitz and Wright  teach the method of claim 9, further comprising: collecting, by an inertial measurement unit (IMU), acceleration and of the wearable visual enhancement device. (¶0028 of Wright “Computer 210 may include or have access to a computing environment that includes input 216, output 218, and a communication connection 220. The input 216 may include one or more of a touchscreen, touchpad, one or more cameras, mouse, keyboard, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 210, and other input devices. The input 216 may further include inertial measurement unit (IMU), which may include an accelerometer and/or a gyroscope. As discussed below, the computer 210 may use the camera and/or the IMU to determine whether the computer has been put down (e.g., placed on a surface such as a table) or placed in the local user's pocket or a bag.”) In addition, the same motivation is used as the rejection for claim 1. Gauglitz and Wright are understood to be silent on the remaining limitations of claim 10.
In the same field of endeavor, Smith teaches collecting, by a camera of the wearable visual enhancement device, color information of a color image of the scene; collecting, by a depth camera of the wearable visual enhancement device, distance information of a depth image of the scene (col. 4, lines 52-59 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”); and  collecting, by an inertial measurement unit (IMU), velocity of the wearable visual enhancement device (col.7, lines 23-28 “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”) In addition, the same motivation is used as the rejection for claim 2. Gauglitz, Wright and Smith are understood to be silent on the remaining limitations of claim 10.
In the same field of endeavor, Naimark teaches an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the device (¶0018 “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”) In addition, the same motivation is used as the rejection for claim 2.
further comprising: collecting, by a camera of the wearable visual enhancement device, color information of a color image of the scene; collecting, by a depth camera of the wearable visual enhancement device, distance information of a depth image of the scene; and  20Attorney Docket No.: 81023-000035 collecting, by an inertial measurement unit (IMU), acceleration and angular velocity of the wearable visual enhancement device.
Regarding claim 11, Gauglitz ,Wright , Smith and Naimark teach the method of claim 10, further comprising generating, by a tracker, degree of freedom (DoF) information at least partially based on the acceleration and angular velocity (col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”) In addition, the same motivation is used as the rejection for claim 2.
Regarding independent claim 17, Gauglitz teaches a wearable visual enhancement device (¶0042] The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display”), comprising:
a near eye display (¶0042 “The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”), a processor, and a non-transitory computer readable medium that store instructions, when executed by the processor (¶0030 “he functions or algorithms described herein may be implemented in hardware, software, or a combination of software and hardware. The software comprises computer executable instructions stored on computer readable media such as memory or other type of storage devices. Further, described functions may correspond to modules, which may , causes the processor to:
scan a scene in a real world in a forward field-of-view of a first user by the camera (¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”), 
generate sensor data associated with one or more objects in the scene and transmit the sensor data to a computing system at a second location (¶0043 “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system. The local user's system receives information about annotations from the remote system and uses this information together with the live video to render the augmented view.”);
receive, from the computing system at the second location, only information that identifies a mark associated with a first object in the scene (¶0066 “The remote user sets a marker by simply left-clicking into the view (irrespective  and display the mark adjacent to the first object by the near-eye display (¶0064 “In addition to being able to control the viewpoint, the remote user can set and remove virtual annotations. Annotations are saved in 3D world coordinates, are shared with the local user's mobile device via the network, and immediately appear in all views of the world correctly anchored to their 3D world position (cf. FIGS. 1 and 3).”) Gauglitz is understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Wright teaches an inertial measurement unit (IMU) configured to collect acceleration the wearable visual enhancement device (¶0028 of Wright “Computer 210 may include or have access to a computing environment that includes input 216, output 218, and a communication connection 220. The input 216 may include one or more of a touchscreen, touchpad, one or more cameras, mouse, keyboard, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 210, and other input devices. The input 216 may further include inertial measurement unit (IMU), which may include an accelerometer and/or a gyroscope. As discussed below, the computer 210 may use the camera and/or the IMU to determine whether the computer has been put down (e.g., placed on a surface such as a table) or placed in the local user's pocket or a bag.”);
receive, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information, and modify the displayed mark in accordance with the received edit signal(¶0168 “…Alternatively, or additionally, the user may provide feedback as steps are completed to indicate to the system or user device that an annotation may be removed ¶0256 “The system might be able to take a verbal command, verbal statement from the user or from the helpee and remove annotations or something based off that so open this panel, okay, open the panel and then maybe the system recognizes that and removes the annotation” where the user provide feedback or helpee using verbal command to indicate remove the annotations which is considered as receive an edit signal to modify the displayed mark)  In addition, the same motivation is used as the rejection for claim 1. Both Gauglitz ,Wright are understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Smith teaches a wearable visual enhancement device (col.4, lines 4-7 “Head Wearable Display ("HWD") Remote Assistant System ("HWD-RAS") may be configured to provide an Augmented Reality ("AR") enhanced collaboration, maintenance or training by communicating instruction using an AR platform.”), comprising: 
a camera configured to collect color information of a color image of a scene (col. 4, lines 52-59 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model.),
a depth camera configured to collect distance information of a depth image of the scene (col. 4, lines 52-59 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”),
an inertial measurement unit (IMU) configured to collect velocity of the wearable visual enhancement device (col.7, lines 23-28 “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”),
a near eye display (col.8, lines 4-8 “By way of non-limiting example, the wearable/mobile computing device 20 may be an HMD device where the display 23 may be mounted in the HMD device. The HMD device may include see-through lenses , a processor, and a non-transitory computer readable medium that store instructions, when executed by the processor (col.14, lines 1-5 “In view of the above, a non-transitory processor readable storage medium is provided. The storage medium may comprise an executable computer program product which further comprises a computer software code that, when executed on a processor”), causes the processor to:
scan a scene in a real world in a forward field-of-view of a first user by the camera and the depth camera col. 4, lines 52-59 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”; col.6, lines 47-53 “FIG. 5 illustrates a block diagram of an HWD remote assistant system for network-based collaboration, training and/or maintenance in accordance with an embodiment. At a first location, User A 40 uses a wearable/mobile computing device 20. With the device 20, , 
generate sensor data associated with one or more objects in the scene by the inertial measurement unit (IMU) (col.4, lines 53-67 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a depth/RGB sensor 50 (illustrated in FIG. 3A) may include an ASUS.RTM. Xtion sensor.”; col.7, lines 23-28 “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).”); and transmit the sensor data to a computing system at a second location ( col.6, lines 47-59 “FIG. 5 illustrates a block diagram of an HWD remote assistant system for network-based collaboration, training and/or maintenance in accordance with an embodiment. At a first location, User A 40 uses a wearable/mobile computing device 20. With the device 20, User A scans an . In addition, the same motivation is used as the rejection for claim 2. Gauglitz, Wright and Smith are understood to be silent on the remaining limitations of claim 17.
However, Naimark teaches an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the device (¶0018 “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”), generate sensor data associated with one or more objects in the scene by the inertial measurement unit (IMU) (¶0018 “The The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”) In addition, the same motivation is used as the rejection for claim 2.
Thus, the combination of Gauglitz ,Wright, Smith and Naimark teaches a wearable visual enhancement device, comprising: a camera configured to collect color information of a color image of a scene, a depth camera configured to collect distance information of a depth image of the scene, an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the wearable visual enhancement device, 6Attorney Docket No.: 81023-000035 a near eye display, a processor, and a non-transitory computer readable medium that store instructions, when executed by the processor, causes the processor to: scan a scene in a real world in a forward field-of-view of a first user by the camera and the depth camera, generate sensor data associated with one or more objects in the scene by the inertial measurement unit (IMU), and transmit the sensor data to a computing system at a second location; receive, from the computing system at the second location, only information that identifies a mark associated with a first object in the scene, display the mark adjacent to the first object by the near-eye display, receive, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information, and modify the displayed mark in accordance with the received edit signal.
Regarding claim 18, Gauglitz ,Wright, Smith and Naimark teach the wearable visual enhancement device of claim 17, wherein the instructions further cause the processor to generate degree of freedom (DoF) information at least partially based on the acceleration and angular velocity (col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” ) In addition, the same motivation is used as the rejection for claim 17.
3.	Claims 4, 6, 12, 14, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Wright et al, U.S Patent Application Publication No. 20200394012 (“Wright”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”) further in view of Xue et al, U.S Patent Application Publication No. 2020/0098186 (“Xue”) 
Regarding claim 4, Gauglitz ,Wright, Smith and Naimark teach the remote assistance system of claim 3, wherein the wearable visual enhancement device includes a first communication unit configured to transmit the color information of the color image, and the distance information of the depth image to the computing system at the second location (col.4, lines 52-67 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 
In the same field of endeavor, Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information to the computing system at the second location (¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”)

Thus, the combination of Gauglitz, Wright, Smith ,Naimark and Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location.
Regarding claim 6, Gauglitz ,Wright, Smith ,Naimark and Xue teach the remote assistance system of claim 4, wherein the computing system includes a second communication unit configured to receive the DoF information, the color information, and the distance information (col.6, lines 53-59 of Smith “The captured 3D model may be communicated to a second location, such as over a network 45, to a computing device 60, or processor. In an embodiment, the captured 3D model of the object 35 may be communicated to a user B 50. The captured 3D model 65 may be received by the computing device 60 and displayed to the user B 50 via the display of the computing device 60.”;¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system.; ¶0049 “The network module receives the data stream from the local user's device, sends the incoming video data on to the decoder, and finally notifies the ¶0148 of Xue “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”) In addition, the same motivation is used as the rejection for claim 4.
Regarding claim 12, Gauglitz ,Wright, Smith ,Naimark teach the method of claim 11, further comprising transmitting, by a first communication unit,  the color information of the color image, and the distance information of the depth image to the computing system at the second location (col.4, lines 52-67 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a depth/RGB sensor 50 (illustrated in FIG. 3A) may include an ASUS.RTM. Xtion sensor.”; col.6, lines 47-59 of Smith “FIG. 5 illustrates a block diagram of an HWD 
In the same field of endeavor, Xue teaches further comprising transmitting, by a first communication unit, the DoF information, to the computing system at the second location(¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”) In addition, the same motivation is used as the rejection for claim 4.
Thus, the combination of Gauglitz, Wright, Smith ,Naimark and Xue teaches further comprising transmitting, by a first communication unit, the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location.
Regarding claim 14, Gauglitz ,Wright, Smith ,Naimark and Xue teach the method of claim 12, further comprising receiving, by a second communication unit, the DoF information, the color information, and the distance information (col.6, lines 53-59 of Smith “The captured 3D model may be communicated to a second location, such as over a network 45, to a computing device 60, or processor. In an embodiment, the captured 3D model of the object 35 may be communicated to a user B 50. The captured 3D model 65 may be received by the computing device 60 and displayed to the user B 50 via the display of the computing device 60.”; ¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system.; ¶0049 “The network module receives the data stream from the local user's device, sends the incoming video data on to the decoder, and finally notifies the main module when a new frame (decoded image data+meta-data) is available”;; ¶0148 of Xue “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”) In addition, the same motivation is used as the rejection for claim 4.
Regarding claim 19, Gauglitz, Wright, Smith ,Naimark teach the wearable visual enhancement device of claim 18, wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location (col.4, lines 52-67 “Additionally, the HWD device is configured to perform depth/Red-
In the same field of endeavor, Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information to the computing system at the second location (¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”) In addition, the same motivation is used as the rejection for claim 4.
Thus, the combination of Gauglitz ,Wright, Smith ,Naimark and Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location.
4.	Claims 5 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Wright et al, U.S Patent Application Publication No. 20200394012 (“Wright”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”) further in view of THUDOR, WO2019/055389 (“THUDOR”) further in view of Marlatt et al, U.S Patent Application Publication No. 2015/0201198 (“Marlatt”)
Regarding claim 5, Gauglitz ,Wright, Smith ,Naimark teach the remote assistance system of claim 3, wherein the wearable visual enhancement device further includes an image integration unit configured to combine the color information of the color image, the distance information of the depth image, and the DoF information (¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system. The local user's system receives information about annotations from the remote system and uses this information together with the live video to render the augmented view.” where teaches encode live video stream along with tracked camera pose which is considered as combine information into a frame; col.4, lines 52-67 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a depth/RGB sensor 50 (illustrated in FIG. 3A) may include an ASUS.RTM. Xtion sensor.” Where Smith teaches the color information of the color image, the distance information of the depth image; col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” where teaches the DOF information;) In addition, the same motivation is used as the rejection for claim 2. 
Gauglitz teaches combine information into a frame. Smith, Naimar teaches the color information of the color image, the distance information of the depth image, and the DoF information.   However, Gauglitz ,Wright, Smith ,Naimark  are understood to be silent on combine color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame
In the same field of endeavor, THUDOR teaches an image integration unit configured to combine the color information of the color image, the distance information of the depth image, and the DoF information (see abstract “a sequence of three-dimension scenes is encoded as a video by an encoder and transmitted to a decoder which retrieves the sequence of 3D scenes. Points of a 3D scene visible from a determined point of view are encoded as a color image in a first track of the stream in .”)
Therefore, in combination of Gauglitz ,Wright, Smith ,Naimark, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with encode depth information, depth and color of the scene as seen in THUDOR because this modification would carry data representative of a volumetric scene that can be encoded at once and decoded either as a 3DOF video or as a volumetric video (3DoF+ or 6DoF) and require a small amount of data than the Multiview+ Depth (MDV) standard encoding  (col.2, lines 30-33 of THUDOR). Gauglitz ,Wright, Smith ,Naimark and THUDOR are understood to be silent on the remaining limitations of claim 5.
However, Marlatt teaches wherein the device further includes an image integration unit configured to combine the information of the image that share a timestamp into a frame (¶0039 “In order to avoid the need for synchronization of frames between different streams on the client, and as described herein, it is possible to synchronize the frames of the different encodings on the camera, wrap all frames with the same UTC timestamp into a container frame, and transmit a single stream of container frames to the client. A video source device, such as, for example, a camera, generates source video comprising source frames. The camera applies a UTC Because each of the source frame encodings is generated from the same source frame, they all share the same timestamp. The video source device generates a container frame from the source frame encodings sharing a common timestamp. The video source device appends a timestamp ("container timestamp") to a header of the container frame ("container frame header") that is identical to the timestamps of the various source frame encodings”)
Therefore, in combination of Gauglitz ,Wright, Smith ,Naimark and THUDOR,  it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz and encode depth information, depth and color  of the scene as seen in THUDOR with  generates a container frame from the source frame encodings sharing a common timestamp as seen in Marlatt because this modification would synchronize the frames of the different encodings on the camera (¶0039 of Marlatt ).
Thus, the combination of and Gauglitz ,Wright, Smith ,Naimark, THUDOR and Marlatt teaches wherein the wearable visual enhancement device further includes an image integration unit configured to combine the color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame.
the wearable visual enhancement device of claim 18, wherein the instructions further cause the processor to combine the color information of the color image, the distance information of the depth image, and the DoF information(col.4, lines 52-67 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a depth/RGB sensor 50 (illustrated in FIG. 3A) may include an ASUS.RTM. Xtion sensor.” Where Smith teaches the color information of the color image, the distance information of the depth image; col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” where teaches the DOF information)  
Gauglitz teaches combine information into a frame. Smith, Naimar teaches the color information of the color image, the distance information of the depth image, and the DoF information.   However, Gauglitz, Wright,Smith and Naimark  are understood to be silent on combine color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame
In the same field of endeavor, THUDOR teaches wherein the instructions further cause the processor to combine the color information of the color image, the distance information of the depth image, and the DoF information (see abstract “a sequence of three-dimension scenes is encoded as a video by an encoder and transmitted to a decoder which retrieves the sequence of 3D scenes. Points of a 3D scene visible from a determined point of view are encoded as a color image in a first track of the stream in order to be decodable independently from other tracks of the .”) In addition, the same motivation is used as the rejection for claim 5. Gauglitz, Wright, Smith ,Naimark and THUDOR are understood to be silent on the remaining limitations of claim 20.
However, Marlatt teaches wherein the instructions further cause the processor to combine the information of the image that share a timestamp into a frame (¶0039 “In order to avoid the need for synchronization of frames between different streams on the client, and as described herein, it is possible to synchronize the frames of the different encodings on the camera, wrap all frames with the same UTC timestamp into a container frame, and transmit a single stream of container frames to the client. A video source device, such as, for example, a camera, generates source video comprising source frames. The camera applies a UTC timestamp to each source frame ("source frame timestamp"). The video source device generates multiple encodings of each source frame, each of which is distinguished from the other encodings by using at least one different encoding parameter. Because each of the source frame encodings is generated from the same source frame, they all share the same timestamp. The video source device generates a container frame from the source frame encodings sharing a common timestamp. The video source device appends a timestamp ("container timestamp") to a header of the container frame ("container frame header") that is identical to the timestamps of the various source frame encodings”) In addition, the same motivation is used as the rejection for claim 5.
wherein the instructions further cause the processor to combine the color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame.
5.  Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over S Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”)  further in view of Xue et al, U.S Patent Application Publication No. 2020/0098186 (“Xue”) further in view of THUDOR, WO2019/055389 (“THUDOR”)
Regarding claim 7, Gauglitz ,Wright, Smith ,Naimark and Xue teach the remote assistance system of claim 6, wherein the computing system includes a 3D model generator configured to generate the 3D scene based on the received information (¶0049 of Gauglitz The network module receives the data stream from the local user's device, sends the incoming video data on to the decoder, and finally notifies the main module when a new frame (decoded image data+meta-data) is available. ¶0050 “A 3D surface model is constructed on the fly from the live video stream and from associated camera poses. Keyframes were selected based on a set of heuristics (good tracking quality, low device movement, minimum time interval & translational distance between keyframes), then detect and describe features in the new frame using SIFT. Four closest existing keyframes were chosen and matched against their features (one frame at a time) via an approximate nearest neighbor algorithm and collect matches that satisfy the epipolar constraint (which is known due to the received camera poses) within 
In the same field of endeavor, THUDOR teaches wherein the computing system includes a 3D model generator configured to generate the 3D scene based on the received DoF information, the color information, and the distance information (col.7, lines 15-30 “According to the present principles, a decoding method implemented in a decoder is disclosed. The decoder obtains a stream encoded according the present encoding method from a source, for example a memory or a network interface. The stream comprises at least two elements of syntax, a first element of syntax carrying data representative of a 3D scene for a 3DoF rendering. In an embodiment, this first element of syntax comprises a color image encoded according to a projection mapping of points of the 3D scene to the color image from a determined point of view. The at least one second element of syntax of the stream carries data required by a volumetric renderer to render the 3D scene in 3DoF+ or 6DoF mode. The decoder decodes the first color image from the first element of syntax of the stream. In case the decoder is configured to decode the stream for a 3DoF rendering, the decoder provides a further circuit, for example to a Tenderer or to a format converter with the decoded data from the first element of syntax of the stream. In case the decoder is 
Therefore, in combination of Gauglitz, Smith, Naimark and Xue, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with encode/decode depth information, depth and color  of the scene as seen in THUDOR because this modification would carry data representative of a volumetric scene that can be encoded at once and decoded either as a 3DOF video or as a volumetric video (3DoF+ or 6DoF) and require a small amount of data than the Multiview+ Depth (MDV) standard encoding (col.2, lines 30-33 of THUDOR). 
Thus, the combination of Gauglitz ,Wright, Smith ,Naimark, Xue and THUDOR teaches wherein the computing system includes a 3D model generator configured to generate the 3D scene based on the received DoF information, the color information, and the distance information.
Regarding claim 15, Gauglitz, Wright, Smith ,Naimark and Xue teach the method of claim 14, Remaining of claim 15 is similar in scope to claim 7 and therefore rejected under the same rationale  
6. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Wright et al, U.S Patent Application Publication No. 20200394012 (“Wright”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent 
Regarding claim 13, Gauglitz ,Wright, Smith ,Naimark and Xue teach the method of claim 12, further comprising combine, by an image integration unit, the color information of the color image, the distance information of the depth image, and the DoF information (¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system. The local user's system receives information about annotations from the remote system and uses this information together with the live video to render the augmented view.” where teaches encode live video stream along with tracked camera pose which is considered as combine information into a frame ; col.4, lines 52-67 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” ¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 
Gauglitz teaches combine information into a frame. Smith, Naimar and Xue teaches the color information of the color image, the distance information of the depth image, and the DoF information.  However,  Gauglitz, Wright, Smith,  Naimark and Xue  are understood to be silent on combine color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame
In the same field of endeavor, THUDOR teaches combining, by an image integration unit, the color information of the color image, the distance information of the depth image, and the DoF information (see abstract “a sequence of three-dimension scenes is encoded as a video by an encoder and transmitted to a decoder which retrieves the sequence of 3D scenes. Points of a 3D scene visible from a determined point of view are encoded as a color image in a first track of the stream in order to be decodable independently from other tracks of the stream. The color image is compatible with a three degrees of freedom rendering. Depth information and depth and color of residual points of the scene are encoded in separate tracks of the stream and are decoded only in case the decoder is configured to decode the scene for a volumetric rendering.”)
Therefore, in combination of Gauglitz ,Wright, Smith ,Naimark and Xue, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with encode depth information, depth and color  of the scene 
However, Marlatt teaches further comprising combining, by an image integration unit, the information of the  image that share a timestamp into a frame (¶0039 “In order to avoid the need for synchronization of frames between different streams on the client, and as described herein, it is possible to synchronize the frames of the different encodings on the camera, wrap all frames with the same UTC timestamp into a container frame, and transmit a single stream of container frames to the client. A video source device, such as, for example, a camera, generates source video comprising source frames. The camera applies a UTC timestamp to each source frame ("source frame timestamp"). The video source device generates multiple encodings of each source frame, each of which is distinguished from the other encodings by using at least one different encoding parameter. Because each of the source frame encodings is generated from the same source frame, they all share the same timestamp. The video source device generates a container frame from the source frame encodings sharing a common timestamp. The video source device appends a timestamp ("container timestamp") to a header of the container frame ("container frame header") that is identical to the timestamps of the various source frame encodings”)

Thus, the combination of Gauglitz ,Wright, Smith ,Naimark, Xue, THUDOR and Marlatt teaches further comprising combining, by an image integration unit, the color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame.

=====================================
1.	Claims 1, 8-9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Doris et al, U.S Patent Application Publication No. 20090289956 (“Doris”)
Regarding independent claim 1, Gauglitz teaches a remote assistance system (Fig.2), comprising: 
a wearable visual enhancement device at a first location (¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR The system could equally be used with other AR displays such as a head-worn or projective display” where a head –worn is considered as a wearable visual enhancement device) configured to: 
scan a scene in a real world in a forward field-of-view of a first user (¶0031 “FIG. 1 shows an example live Augmented Reality-based remote collaboration system 100, according to an embodiment. FIG. 1 shows the user in physical location A in front of a car engine, identifying a particular element, which the user in physical location B has marked with the yellow dot.”; ¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”),
generate sensor data associated with one or more objects in the scene, and transmit the sensor data (¶0043 “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system”);  and 
a computing system at a second location (¶0041 “FIG. 2 shows an overview of the system architecture 200, according to an embodiment. System architecture 200 shows both the local user's system on top and the remote user's system on bottom. In ”) configured to: 
receive the sensor data (¶0047 “The system consists of five main modules--network module, 3D modeler, camera control, annotation control, renderer--and the framework to hold them together”; ¶0049 “The network module receives the data stream from the local user's device, sends the incoming video data on to the decoder, and finally notifies the main module when a new frame (decoded image data+meta-data) is available.), 
generate a 3D scene including 3D models of the one or more objects (¶0050 “A 3D surface model is constructed on the fly from the live video stream and from associated camera poses. Keyframes were selected based on a set of heuristics (good tracking quality, low device movement, minimum time interval & translational distance between keyframes), then detect and describe features in the new frame using SIFT. Four closest existing keyframes were chosen and matched against their features (one frame at a time) via an approximate nearest neighbor algorithm and collect matches that satisfy the epipolar constraint (which is known due to the received camera poses) within some tolerance as tentative 3D points. If a feature has previously been matched to features from other frames, we check for mutual epipolar consistency of all observations and merge them into a single 3D point if possible; otherwise, the two 3D points remain as competing hypotheses”; ¶0068 “The renderer renders the scene using the 3D model, the continually updated keyframes, the incoming live camera frame (including live camera pose), the virtual camera pose, and the annotations”; ¶0076 “ Currently, the 3D , 
receive, via input by a second user, a mark associated with one of the 3D models (¶0066 “The remote user sets a marker by simply left-clicking into the view (irrespective if "live" or "decoupled"). The depth of the marker is derived from the 3D model, presuming that the user wants to mark things on physical surfaces rather than in mid-air. Pressing the space bar removes”), and 
transmit only information that identifies the mark to the wearable visual enhancement device wherein the wearable visual enhancement device is further configured to display the mark adjacent to the object corresponding to the one of the 3D models (¶0064 “In addition to being able to control the viewpoint, the remote user can set and remove virtual annotations. Annotations are saved in 3D world coordinates, are shared with the local user's mobile device via the network, and immediately appear in all views of the world correctly anchored to their 3D world position (cf. FIGS. 1 and 3).”) Gauglitz is understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor, Doris teaches transmit only information that identifies the mark to the wearable visual enhancement device wherein the wearable visual enhancement device is further configured to display the mark adjacent to the object (¶0054 “The server may obtain (e.g., retrieve and/or generate) overlay information for use in generating a transparent overlay via the device using at he server may then transmit the overlay information to the device at 306.”)
receive, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information, and modify the displayed mark in accordance with the received edit signal(¶0057 “In this example, the transparent overlay includes three different virtual billboards, each of which is placed in front of a business (e.g, restaurant) with which it is associated. The first virtual billboard 402 is a billboard associated with a McDonald's restaurant, the second virtual billboard 404 is a billboard associated with Bravo Cucina restaurant, and the third virtual billboard 406 is associated with Georges restaurant 408. As shown at 402, a virtual billboard may provide an advertisement, menu and/or additional functionality. For instance, a user may place an order to the business via the associated virtual billboard and/or pay for the order electronically, enabling the user to walk in to the business and pick up the order. For instance, the user may "grab and pull" to increase the size of a virtual billboard or menu, or "grab and push" to reduce the size of a virtual billboard or menu.” where user can increase/reduce a virtual billboard therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify a virtual billboard as seen in Doris with a mark because this modification would adjust the transparent overlay by the user of the reality overlay device (¶0057))
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotations at remote user’s device to local user’s device of Gauglitz with increasing or reducing the size of virtual images which generated from a remote location displaying on reality overlay device  as seen in Doris because this modification would edit the transparent overlay (¶0057 of Doris).
Thus, the combination of Gauglitz and Doris teaches all limitations of claim 1.
Regarding claim 8, Gauglitz and Doris teaches the remote assistance system of claim 1, wherein the computing system is further configured to adjust a virtual perception of the second user in the 3D scene in response to users inputs from the second user (¶0060-0061 of Gauglitz “ The user can also zoom into and out of the 
Regarding independent claim 9, Gauglitz teaches a method for remote assistance, comprising: 
scanning, by a wearable visual enhancement device at a first location, a scene in a real world in a forward field-of-view of a first user (¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”);
 generating, by the wearable visual enhancement device, sensor data associated with one or more objects in the scene (¶0043 “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system”); 
generating, by a computing system at a second location, a 3D scene including 3D models of the one or more objects (¶0050 “A 3D surface model is constructed on the fly from the live video stream and from associated camera poses. Keyframes were selected based on a set of heuristics (good tracking quality, low device movement, minimum time interval & translational distance between keyframes), then detect and describe features in the new frame using SIFT. Four closest existing keyframes were chosen and matched against their features (one frame at a time) via an approximate nearest neighbor algorithm and collect matches that satisfy the epipolar constraint (which is known due to the received camera poses) within some tolerance as tentative 3D points. If a feature has previously been matched to features from other frames, we check for mutual epipolar consistency of all observations and merge them into a single 3D point if possible; otherwise, the two 3D points remain as competing hypotheses”; ¶0068 “The renderer renders the scene using the 3D model, the continually updated keyframes, the incoming live camera frame (including live camera pose), the virtual camera pose, and the annotations”; ¶0076 “ Currently, the 3D model is available only on the remote user's side” where 3D model is available only on remote user’ side and Fig. 2 shows the system consists of five main modules--network module, 3D modeler, camera control, annotation control, renderer is same side with remote user’s side.)”; 
receiving, via input to the computing system by a second user, a mark associated with one of the 3D models(¶0066 “The remote user sets a marker by simply left-clicking into the view (irrespective if "live" or "decoupled"). The depth of the ; 
transmitting, by the computing system, only information that identifies the mark to the wearable visual enhancement device; displaying, by the wearable visual enhancement device, the mark adjacent to the object corresponding to the one of the 3D models (¶0064 “In addition to being able to control the viewpoint, the remote user can set and remove virtual annotations. Annotations are saved in 3D world coordinates, are shared with the local user's mobile device via the network, and immediately appear in all views of the world correctly anchored to their 3D world position (cf. FIGS. 1 and 3).”) Gauglitz is understood to be silent on the remaining limitations of claim 9.
In the same field of endeavor, Doris teaches transmitting, by the computing system, only information that identifies the mark to the wearable visual enhancement device; displaying, by the wearable visual enhancement device, the mark adjacent to the object (¶0054 “The server may obtain (e.g., retrieve and/or generate) overlay information for use in generating a transparent overlay via the device using at least a portion of the captured information and/or at least a portion of any user information that has been received at 304, wherein the transparent overlay provides one or more transparent images that are pertinent to the physical surroundings. For instance, the server may identify one or more entities in the visual information using at least a portion of the received information. Thus, the server may support pattern recognition, as well as other features. The server may also identify one or more entities that are within a specific distance from the location of the reality overlay device. The he server may then transmit the overlay information to the device at 306.”)
receiving, by the wearable visual enhancement device, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information; and modifying, by the wearable visual enhancement device, the displayed mark in accordance with the received edit signal (¶0057 “In this example, the transparent overlay includes three different virtual billboards, each of which is placed in front of a business (e.g, restaurant) with which it is associated. The first virtual billboard 402 is a billboard associated with a McDonald's restaurant, the second virtual billboard 404 is a billboard associated with Bravo Cucina restaurant, and the third virtual billboard 406 is associated with Georges restaurant 408. As shown at 402, a virtual billboard may provide an advertisement, menu and/or additional functionality. For instance, a user may place an order to the business via the associated virtual billboard and/or pay for the order electronically, enabling the user to walk in to the business and pick up the order. As one example, the user may place the order via a command such as a voice command such as "place order at McDonalds." As another example, the user of the reality overlay device may virtually touch a "Start Order Now" button that is displayed in the transparent overlay by lifting his or her hand into the user's field of vision. In this manner, the user may silently interact with the reality overlay device using a gestural interface. Such physical movements may also be used to modify the For instance, the user may "grab and pull" to increase the size of a virtual billboard or menu, or "grab and push" to reduce the size of a virtual billboard or menu.” where user can increase/reduce a virtual billboard therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify a virtual billboard as seen in Doris with a mark because this modification would adjust the transparent overlay by the user of the reality overlay device (¶0057)) In addition, the same motivation is used as the rejection for claim 1.
Thus, the combination of Gauglitz and Doris teaches all limitations of claim 9.
Regarding claim 16, Gauglitz and Doris teach the method of claim 9, Remaining of claim 16 is similar in scope to claim 8 and therefore rejected under the same rationale.  
2.	Claims 2-3, 10-11, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Doris et al, U.S Patent Application Publication No. 20090289956 (“Doris”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”)
Regarding claim 2, Gauglitz and Doris teaches the remote assistance system of claim 1, Gauglitz and Doris are understood to be silent on the remaining limitations of claim 2.
In the same field of endeavor, Smith teaches wherein the wearable visual enhancement device includes a camera configured to collect color information of a color image of the scene, a depth camera configured to collect distance information of a depth image of the scene (col. 4, lines 52-59 of Smith “Additionally, vides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”), and an inertial measurement unit (IMU) configured to collect velocity of the wearable visual enhancement device (col.7, lines 23-28  of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”) 
Therefore, in combination of Gauglitz and Doris, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with using the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor at local user’s side of Smith because this modification 
In the same field of endeavor, Naimark teaches an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the device (¶0018 “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”)
 Therefore, in combination of Gauglitz , Doris and Smith, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with including the inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers as seen in Naimark because this modification would provide six 
Thus, the combination of Gauglitz, Doris, Smith and Naimark teaches all limitations of claim 2.
Regarding claim 3, Gauglitz, Doris, Smith and Naimark teach the remote assistance system of claim 2, wherein the wearable visual enhancement device includes a tracker configured to generate degree of freedom (DoF) information at least partially based on the acceleration and angular velocity (col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” ) In addition, the same motivation is used as the rejection for claim 2.
Regarding claim 10, Gauglitz and Doris teach the method of claim 9, further comprising: Remaining of claim 10 is similar in scope to claim 2 and therefore rejected under the same rationale  
Regarding claim 11, Gauglitz, Doris, Smith and Naimark teach the method of claim 10, Remaining of claim 11 is similar in scope to claim 3 and therefore rejected under the same rationale  
Regarding independent claim 17, Gauglitz teaches a wearable visual enhancement device (¶0042] The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display”), comprising:
a near eye display (¶0042 “The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or , a processor, and a non-transitory computer readable medium that store instructions, when executed by the processor (¶0030 “he functions or algorithms described herein may be implemented in hardware, software, or a combination of software and hardware. The software comprises computer executable instructions stored on computer readable media such as memory or other type of storage devices. Further, described functions may correspond to modules, which may be software, hardware, firmware, or any combination thereof. Multiple functions are performed in one or more modules as desired, and the embodiments described are merely examples. The software is executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a system, such as a personal computer, a mobile device, server, a router, or other device capable of processing data including network interconnection device”), causes the processor to:
scan a scene in a real world in a forward field-of-view of a first user by the camera (¶0042 “ The local user's interface, running on a lightweight tablet, is intentionally simple. From the user's perspective, it behaves exactly like a live video of the user's own view plus AR annotations, i.e., a classic magic lens. (The local user's view is not affected by remote user's camera control.) The only control the user exerts during its operation is by manipulating the position and orientation of the device. The system could equally be used with other AR displays such as a head-worn or projective display.”), 
generate sensor data associated with one or more objects in the scene and transmit the sensor data to a computing system at a second location (¶0043 “Under the hood, the system runs a SLAM system and sends the tracked camera pose 
receive, from the computing system at the second location, only information that identifies a mark associated with a first object in the scene (¶0066 “The remote user sets a marker by simply left-clicking into the view (irrespective if "live" or "decoupled"). The depth of the marker is derived from the 3D model, presuming that the user wants to mark things on physical surfaces rather than in mid-air. Pressing the space bar removes”) and display the mark adjacent to the first object by the near-eye display (¶0064 “In addition to being able to control the viewpoint, the remote user can set and remove virtual annotations. Annotations are saved in 3D world coordinates, are shared with the local user's mobile device via the network, and immediately appear in all views of the world correctly anchored to their 3D world position (cf. FIGS. 1 and 3).”) Gauglitz is understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Doris teaches receive, from the computing system at the second location, only information that identifies a mark associated with a first object in the scene, display the mark adjacent to the first object by the near-eye display, (¶0054 “The server may obtain (e.g., retrieve and/or generate) overlay information for use in generating a transparent overlay via the device using at least a portion of the captured information and/or at least a portion of any user information that has been received at 304, wherein the transparent overlay provides one or more transparent images that are pertinent to the physical surroundings. For he server may then transmit the overlay information to the device at 306.”)
receive, via input by the first user, an edit signal to modify the displayed mark identified by the transmitted information, and modify the displayed mark in accordance with the received edit signal(¶0057 “In this example, the transparent overlay includes three different virtual billboards, each of which is placed in front of a business (e.g, restaurant) with which it is associated. The first virtual billboard 402 is a billboard associated with a McDonald's restaurant, the second virtual billboard 404 is a billboard associated with Bravo Cucina restaurant, and the third virtual billboard 406 is associated with Georges restaurant 408. As shown at 402, a virtual billboard may provide an advertisement, menu and/or additional functionality. For instance, a user may place an order to the business via the associated virtual billboard and/or pay for the order electronically, enabling the user to walk in to the business and pick up the order. As one example, the user may place the order via a command such as a voice command such as "place order at McDonalds." As another example, the user of the reality overlay device may virtually touch a "Start Order Now" button that is displayed in For instance, the user may "grab and pull" to increase the size of a virtual billboard or menu, or "grab and push" to reduce the size of a virtual billboard or menu.” where user can increase/reduce a virtual billboard therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify a virtual billboard as seen in Doris with a mark because this modification would adjust the transparent overlay by the user of the reality overlay device (¶0057)) In addition, the same motivation is used as the rejection for claim 1. Both Gauglitz and Doris are understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Smith teaches a wearable visual enhancement device (col.4, lines 4-7 “Head Wearable Display ("HWD") Remote Assistant System ("HWD-RAS") may be configured to provide an Augmented Reality ("AR") enhanced collaboration, maintenance or training by communicating instruction using an AR platform.”), comprising: 
a camera configured to collect color information of a color image of a scene (col. 4, lines 52-59 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model.),
a depth camera configured to collect distance information of a depth image of the scene (col. 4, lines 52-59 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”),
an inertial measurement unit (IMU) configured to collect velocity of the wearable visual enhancement device (col.7, lines 23-28 “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”),
a near eye display (col.8, lines 4-8 “By way of non-limiting example, the wearable/mobile computing device 20 may be an HMD device where the display 23 may be mounted in the HMD device. The HMD device may include see-through lenses , a processor ,and a non-transitory computer readable medium that store instructions, when executed by the processor (col.14, lines 1-5 “In view of the above, a non-transitory processor readable storage medium is provided. The storage medium may comprise an executable computer program product which further comprises a computer software code that, when executed on a processor”), causes the processor to:
scan a scene in a real world in a forward field-of-view of a first user by the camera and the depth camera col. 4, lines 52-59 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model”; col.5, lines 47-56 “The 3D-MSE may include, by way of non-limiting example, KinFu by Microsoft.RTM. which is an open-source application configured to provide 3D visualization and interaction. The 3D-MSE may process live depth data from a camera/sensor and create a Point Cloud and 3D models for real-time visualization and interaction. A graphics processing unit (GPU) 62, such as by way of non-limiting example a CUDA graphics processing unit, may be used to execute the open-source application. A visualization tool kit 63 may be provided.”; col.6, lines 47-53 “FIG. 5 illustrates a block diagram of an HWD remote assistant system for network-based collaboration, training and/or maintenance in accordance with an embodiment. At a first location, User A 40 uses a wearable/mobile computing device 20. With the device 20, , 
generate sensor data associated with one or more objects in the scene by the inertial measurement unit (IMU) (col.4, lines 53-67 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a depth/RGB sensor 50 (illustrated in FIG. 3A) may include an ASUS.RTM. Xtion sensor.”; col.7, lines 23-28 “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).”); and transmit the sensor data to a computing system at a second location ( col.6, lines 47-59 “FIG. 5 illustrates a block diagram of an HWD remote assistant system for network-based collaboration, training and/or maintenance in accordance with an embodiment. At a first location, User A 40 uses a wearable/mobile computing device 20. With the device 20, User A scans an . In addition, the same motivation is used as the rejection for claim 2. Gauglitz, Doris and Smith are understood to be silent on the remaining limitations of claim 17.
However, Naimark teaches an inertial measurement unit (IMU) configured to collect acceleration and angular velocity of the device (¶0018 “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”), generate sensor data associated with one or more objects in the scene by the inertial measurement unit (IMU) (¶0018 “The The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values”) In addition, the same motivation is used as the rejection for claim 2.
Thus, the combination of Gauglitz, Doris, Smith and Naimark teaches all limitations of claim 17.
Regarding claim 18, Gauglitz, Doris, Smith and Naimark teach the wearable visual enhancement device of claim 17, wherein the instructions further cause the processor to generate degree of freedom (DoF) information at least partially based on the acceleration and angular velocity (col.7, lines 23-28 of Smith “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of he inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” ) In addition, the same motivation is used as the rejection for claim 17.
3.	Claims 4, 6, 12, 14, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of view of Doris et al, U.S Patent Application Publication No. 20090289956 (“Doris”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”) further in view of Xue et al, U.S Patent Application Publication No. 2020/0098186 (“Xue”) 
Regarding claim 4, Gauglitz, Doris, Smith and Naimark teach the remote assistance system of claim 3, wherein the wearable visual enhancement device includes a first communication unit configured to transmit the color information of the color image, and the distance information of the depth image to the computing system at the second location (col.4, lines 52-67 of Smith “; col.6, lines 47-59 of Smith) In addition, the same motivation is used as the rejection for claim 2. Gauglitz, Doris, Smith and Naimark are silent on transmit the DoF information.
In the same field of endeavor, Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information to the computing system at the second location(¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”)
Therefore, in combination of Gauglitz, Doris, Smith and Naimark, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz  with transmit the DoF information to a computing system at the second location as seen in Xue because this modification would render one or more frames for display based on the received pose information at the server (¶0137 of Xue).
Thus, the combination of Gauglitz, Doris, Smith, Naimark and Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location.
Regarding claim 6, Gauglitz, Doris, Smith, Naimark and  Xue teach the remote assistance system of claim 4, wherein the computing system includes a second communication unit configured to receive the DoF information, the color information, and the distance information (col.6, lines 53-59 of Smith “The captured 3D model may be communicated to a second location, such as over a network 45, to a computing device 60, or processor. In an embodiment, the captured 3D model of the object 35 may be communicated to a user B 50. The captured 3D model 65 may be received by the computing device 60 and displayed to the user B 50 via the display of the computing device 60.”;¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system.; ¶0049 “The network module receives the data stream from the local user's device, sends the incoming video data on to the decoder, and finally notifies the main module when a new frame (decoded image data+meta-data) is available”;  ¶0148 of Xue “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”) In addition, the same motivation is used as the rejection for claim 4.
he method of claim 11, Remaining of claim 12 is similar in scope to claim 4 and therefore rejected under the same rationale.  
Regarding claim 14, Gauglitz, Doris, Smith,  Naimark and Xue teach the method of claim 12, Remaining of claim 14 is similar in scope to claim 6 and therefore rejected under the same rationale.  
Regarding claim 19, Gauglitz, Doris, Smith and Naimark teach the wearable visual enhancement device of claim 18, wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location (col.4, lines 52-67 “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may communicate sensed depth/RGB data to the 3D modeling simulation engine 32. The depth/RGB transmission block 34 may include data associated with a 3D model of a real-world view of a scene through the lens of the HWD device 100 (as shown in FIG. 4). The scene may include at least one object. By way of a non-limiting example, a depth/RGB sensor 50 (illustrated in FIG. 3A) may include an ASUS.RTM. Xtion sensor.”; col.6, lines 47-59 “FIG. 5 illustrates a block diagram of an HWD remote 
In the same field of endeavor, Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information to the computing system at the second location (¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”) In addition, the same motivation is used as the rejection for claim 4.
Thus, the combination of Gauglitz, Doris, Smith, Naimark and Xue teaches wherein the wearable visual enhancement device includes a first communication unit configured to transmit the DoF information, the color information of the color image, and the distance information of the depth image to the computing system at the second location.
s 5 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of Doris et al, U.S Patent Application Publication No. 20090289956 (“Doris”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”) further in view of THUDOR, WO2019/055389 (“THUDOR”) further in view of Marlatt et al, U.S Patent Application Publication No. 2015/0201198 (“Marlatt”)
Regarding claim 5, Gauglitz, Doris, Smith and Naimark teach the remote assistance system of claim 3, wherein the wearable visual enhancement device further includes an image integration unit configured to combine the color information of the color image, the distance information of the depth image, and the DoF information (¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system. The local user's system receives information about annotations from the remote system and uses this information together with the live video to render the augmented view.” where teaches encode live video stream along with tracked camera pose which is considered as combine information into a frame; col.4, lines 52-67 of Smith “Additionally, the HWD device is configured to perform depth/Red-Blue-Green ("RGB") sensing via a depth/RGB sensor of an object viewed by User A, depth/RGB transmission 34 to User B, and audio transmission 36a from User A to User B. The depth/RGB sensor provides raw color and depth date of an object at a certain level of discretization size, from which the 3D-modeling simulation engine (3D-MSE) 52 (as illustrated in FIG. 2 creates the 3D model. The depth/RGB transmission block 34 may “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in space. The IMU 113 provides inertial motion parameters to the software 111. The IMU 113 is rigidly attached to the mobile device 102 and thereby provides an indication of the movement of the entire system and is used to determine the pose of the system relative to the real world objects 103A viewed by the camera 112. The inertial parameters provided by the IMU 113 include linear acceleration, angular velocity and gyroscopic orientation with respect to the ground. In one embodiment the sensors include at least 3 accelerometers and 3 gyroscopes mounted orthogonally. As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” where teaches the DOF information;) In addition, the same motivation is used as the rejection for claim 2. 
In the same field of endeavor, THUDOR teaches an image integration unit configured to combine the color information of the color image, the distance information of the depth image, and the DoF information (see abstract “a sequence of three-dimension scenes is encoded as a video by an encoder and transmitted to a decoder which retrieves the sequence of 3D scenes. Points of a 3D scene visible from a determined point of view are encoded as a color image in a first track of the stream in order to be decodable independently from other tracks of the stream. The color image is compatible with a three degrees of freedom rendering. Depth information and depth and color of residual points of the scene are encoded in separate tracks of the stream and are decoded only in case the decoder is configured to decode the scene for a volumetric rendering.”)
Gauglitz teaches combine information into a frame. Smith, Naimar teaches the color information of the color image, the distance information of the depth image, and the DoF information.   However, Gauglitz, Doris, Smith and Naimark  are understood to be silent on combine color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame
Therefore, in combination of Gauglitz, Doris, Smith and Naimark, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with encode depth information, depth and color  of the scene as seen 
However, Marlatt teaches wherein the device further includes an image integration unit configured to combine the information of the image that share a timestamp into a frame (¶0039 “In order to avoid the need for synchronization of frames between different streams on the client, and as described herein, it is possible to synchronize the frames of the different encodings on the camera, wrap all frames with the same UTC timestamp into a container frame, and transmit a single stream of container frames to the client. A video source device, such as, for example, a camera, generates source video comprising source frames. The camera applies a UTC timestamp to each source frame ("source frame timestamp"). The video source device generates multiple encodings of each source frame, each of which is distinguished from the other encodings by using at least one different encoding parameter. Because each of the source frame encodings is generated from the same source frame, they all share the same timestamp. The video source device generates a container frame from the source frame encodings sharing a common timestamp. The video source device appends a timestamp ("container timestamp") to a header of the container frame ("container frame header") that is identical to the timestamps of the various source frame encodings”)

Thus, the combination of Gauglitz, Doris, Smith, Naimark, THUDOR and Marlatt teaches wherein the wearable visual enhancement device further includes an image integration unit configured to combine the color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame.
Regarding claim 20, Gauglitz, Doris, Smith and Naimark teach the wearable visual enhancement device of claim 18, Remaining of claim 20 is similar in scope to claim 5 and therefore rejected under the same rationale.  
5.  Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over S Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of  Doris et al, U.S Patent Application Publication No. 20090289956 (“Doris”) further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”)  further in view of Xue et al, U.S Patent Application Publication No. 2020/0098186 (“Xue”) further in view of THUDOR, WO2019/055389 (“THUDOR”)
the remote assistance system of claim 6, wherein the computing system includes a 3D model generator configured to generate the 3D scene based on the received information (¶0049 of Gauglitz The network module receives the data stream from the local user's device, sends the incoming video data on to the decoder, and finally notifies the main module when a new frame (decoded image data+meta-data) is available. ¶0050 “A 3D surface model is constructed on the fly from the live video stream and from associated camera poses. Keyframes were selected based on a set of heuristics (good tracking quality, low device movement, minimum time interval & translational distance between keyframes), then detect and describe features in the new frame using SIFT. Four closest existing keyframes were chosen and matched against their features (one frame at a time) via an approximate nearest neighbor algorithm and collect matches that satisfy the epipolar constraint (which is known due to the received camera poses) within some tolerance as tentative 3D points. If a feature has previously been matched to features from other frames, we check for mutual epipolar consistency of all observations and merge them into a single 3D point if possible; otherwise, the two 3D points remain as competing hypotheses”; ¶0068 “The renderer renders the scene using the 3D model, the continually updated keyframes, the incoming live camera frame (including live camera pose), the virtual camera pose, and the annotations”) Gauglitz,  Doris, Smith, Naimark and Xue are understood to be silent on the remaining limitations of claim 7.
In the same field of endeavor, THUDOR teaches wherein the computing system includes a 3D model generator configured to generate the 3D scene based on the received DoF information, the color information, and the distance information (col.7, lines 15-30 “According to the present principles, a decoding method implemented in a decoder is disclosed. The decoder obtains a stream encoded according the present encoding method from a source, for example a memory or a network interface. The stream comprises at least two elements of syntax, a first element of syntax carrying data representative of a 3D scene for a 3DoF rendering. In an embodiment, this first element of syntax comprises a color image encoded according to a projection mapping of points of the 3D scene to the color image from a determined point of view. The at least one second element of syntax of the stream carries data required by a volumetric renderer to render the 3D scene in 3DoF+ or 6DoF mode. The decoder decodes the first color image from the first element of syntax of the stream. In case the decoder is configured to decode the stream for a 3DoF rendering, the decoder provides a further circuit, for example to a Tenderer or to a format converter with the decoded data from the first element of syntax of the stream. In case the decoder is configured to decode the stream in a volumetric mode (i.e. 3DoF+ or 6DoF), the decoder decodes data embedded in the at least one second element of syntax and provide a further module, for example a Tenderer or a format converter, with every decoded data.”)
Therefore, in combination of Gauglitz,  Doris, Smith, Naimark and Xue,  it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with encode/decode depth information, depth and color  of the scene as seen in THUDOR because this modification would carry data representative of a volumetric scene that can be encoded at once and decoded either as a 3DOF video 
Thus, the combination of Gauglitz, Doris, Smith, Naimark, Xue and THUDOR teaches wherein the computing system includes a 3D model generator configured to generate the 3D scene based on the received DoF information, the color information, and the distance information.
Regarding claim 15, Gauglitz, Doris, Smith, Naimark and Xue teach the method of claim 14, Remaining of claim 15 is similar in scope to claim 7 and therefore rejected under the same rationale  
6. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Gauglitz et al, U.S Patent Application Publication No. 20160358383 (“Gauglitz”) in view of  Doris et al, U.S Patent Application Publication No. 20090289956 (“Doris”)  further in view of Smith et al, U.S Patent No.9088787 (“Smith”) further in view of Naimark, U.S Patent Application Publication No. 2013/0218461 (“Naimark”) further in view of Xue et al, U.S Patent Application Publication No. 2020/0098186 (“Xue”) further in view of THUDOR, WO2019/055389 (“THUDOR”) further in view of Marlatt et al, U.S Patent Application Publication No. 2015/0201198 (“Marlatt”)
Regarding claim 13, Gauglitz, Doris, Smith, Naimark and Xue teach the method of claim 12, further comprising combine, by an image integration unit, the color information of the color image, the distance information of the depth image, and the DoF information (¶0043 of Gauglitz “Under the hood, the system runs a SLAM system and sends the tracked camera pose along with the encoded live video stream to the remote system. The local user's system receives information about annotations from “Additionally, as either the HWD device 100 or the object are moved, the computing system may maintain alignment of the mark or annotation at the fixed location on the object. Though not illustrated, alignment may be maintained with an alignment system or device, such as but not limited to inertial measurement unit (IMU).)”; ¶0018 of Naimark “The inertial motion unit (IMU) 113 is a sensing system composed of several inertial sensors which includes an accelerometer, gyroscope, and magnetometers. In other embodiments, additional sensing systems are used which also provide information about movement of the mobile device 102 in As such, the sensors provide six degrees of freedom--3 translation-related values and 3 rotation-related values” ¶0148 “From the XR server 900, compressed rendered frame video stream is provided to the HMD 910. From the HMD 910, pose information, including, for example, head location, orientation, and 6 -DoF information is provided to the XR server 900 for rendering frames. The downlink traffic from the XR server includes two video frames, for example, up to 300 KB per frame for each eye, every 16.7 ms if a 60 frames-per-second rate is maintained.”;  ) In addition, the same motivation is used as the rejection for claim 4. 
Gauglitz teaches combine information into a frame. Smith, Naimar and Xue teaches the color information of the color image, the distance information of the depth image, and the DoF information.  However,  Gauglitz, Doris, Smith, Naimark and Xue  are understood to be silent on combine color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame
In the same field of endeavor, THUDOR teaches combining, by an image integration unit, the color information of the color image, the distance information of the depth image, and the DoF information (see abstract “a sequence of three-dimension scenes is encoded as a video by an encoder and transmitted to a decoder which retrieves the sequence of 3D scenes. Points of a 3D scene visible from a determined point of view are encoded as a color image in a first track of the stream in order to be decodable independently from other tracks of the stream. The color image is compatible with a three degrees of freedom rendering. Depth information and depth and color of residual points of the scene are encoded in separate tracks of the stream and are decoded only in case the decoder is configured to decode the scene for a volumetric rendering.”)
Therefore, in combination of Gauglitz, Smith, Naimark, Xue, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz with encode depth information, depth and color  of the scene as seen in THUDOR because this modification would carry data representative of a volumetric scene that can be encoded at once and decoded either as a 3DOF video or as a volumetric video (3DoF+ or 6DoF) and require a small amount of data than the Multiview+ Depth (MDV) standard encoding  (col.2, lines 30-33 of THUDOR). Gauglitz, Doris, Smith, Naimark, Xue and THUDOR are understood to be silent on the remaining limitations of claim 13.
However, Marlatt teaches further comprising combining, by an image integration unit, the information of the  image that share a timestamp into a frame (¶0039 “In order to avoid the need for synchronization of frames between different streams on the client, and as described herein, it is possible to synchronize the frames of the different encodings on the camera, wrap all frames with the same UTC timestamp into a container frame, and transmit a single stream of container frames to the client. A video source device, such as, for example, a camera, generates source video comprising source frames. The camera applies a UTC timestamp to each source frame ("source frame timestamp"). The video source device generates multiple encodings of each source frame, each of which is distinguished from the other encodings by using at least one different encoding parameter. Because each of the source frame encodings is generated from the same source frame, they all share the same timestamp. The video source device generates a container frame from the source frame encodings sharing a common timestamp. The video source device appends a timestamp ("container timestamp") to a header of the container frame ("container frame header") that is identical to the timestamps of the various source frame encodings”)
Therefore, in combination of Gauglitz, Smith, Naimark, Xue and THUDOR, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify system, method for providing created annotation at remote user’s device to local user’s device of Gauglitz and encode depth information, depth and color  of the scene as seen in THUDOR with  generates a container frame from the source frame encodings sharing a common timestamp as seen in Marlatt because this modification would synchronize the frames of the different encodings on the camera (¶0039 of Marlatt ).
Thus, the combination of Gauglitz,  Doris, Smith, Naimark, Xue, THUDOR and Marlatt teaches further comprising combining, by an image integration unit, the color information of the color image, the distance information of the depth image, and the DoF information that share a timestamp into a frame.
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842. The examiner can normally be reached Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SARAH LE/Primary Examiner, Art Unit 2619