DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments
Applicant's arguments filed 03/25/2021 have been fully considered but they are not persuasive. 
The applicant argues, “However, Applicant submits that the asserted combination does teach or suggest that overlaying the representation of the user hand on the video includes moving the representation to match movements of thee user handed included in the hand information based on the measurements from the orientation sensor.”.(See Applicant’s Remarks, page 9, third paragraph)
The examiner respectfully disagrees. Seo and You teach of presenting visual information to a user and You teaches that by combining image based tracking and inertial data tracking the system can provide a better registration of augmented data, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the system of Seo with the Augmentation and tracking techniques of You such that the user could have better registration of augmented imaged data in the scene while moving around.
The applicant argues, The remainder of the prior art fails to cure the deficiencies of Seo in view of You and the remainder of the independent claims contain similar amendments and arguments to claim 1. (See Applicant’s Remarks, page 9, paragraph 4 to page 9, last paragraph)
The applicant’s argument is fully considered, but is not persuasive as there are no deficiencies with independent claim 1 as detailed above and in this action.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 1, 4, 6, 7, 10, 12-14, 16, 18  is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo et al. (“Webizing Collaborative Interaction Space for Cross Reality with Various Human Interface Devices”, ACM, June, 2018.)(Hereinafter referred to as Seo) in view of You et al. (“Hybrid Inertial and Vision Tracking for Augmented Reality Registration”, IEEE. 2002)(Hereinafter referred to as You).

a non-transitory computer readable medium, storing instructions for executing a process for a mobile device comprising an orientation sensor and a camera (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10) ( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph), 
the instructions comprising: 
transmitting video from the camera (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph); 
receiving hand information associated with a user hand from the another device (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph); ; 
and overlaying a representation of the user hand on the video for display by the mobile device based on the hand information, the representation of the user hand overlaid on the video in an orientation (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)(See figure 10(a) and 10(b)), but is silent to and measurements from the orientation sensor to another device and determined from the measurements from the orientation sensor, wherein overlaying the representation of the user hand on the video for display by the mobile device comprises, based on the measurements from the orientation sensor, moving the representation of the user hand on the video for display to match with the movements of the user hand included in the hand information.
You teaches a technique in which image based tracking can be combined with inertial data to provide registration in and augmentation in an image (Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract).
Seo and You teach of presenting visual information to a user and You teaches that by combining image based tracking and inertial data tracking the system can provide a better registration of augmented data, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the system of Seo with the Augmentation and tracking techniques of You such that the user could have better registration of augmented imaged data in the scene while moving around. 

Regarding claim 4, Seo in view of You teaches the non-transitory computer readable medium of claim 1, the instructions further comprising: establishing, through a web browser, a browser to browser connection from the mobile device to another web browser of the another device; wherein the transmitting the video from the camera and the measurements from the orientation sensor to the another device, and the receiving hand information from the another device is conducted through the browser to browser connection  (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10) (Seo;An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph) (Seo; In this paper, we propose a webizing method for a collaborative interaction space that provides user authentication and manages user sessions in XR environments. In addition, the webizing method supports human interface devices and related events via an interaction adaptor that delivers events based on user session and converts event messages according to XR content types such as VRPN messages, X3D sensor data, and HTML DOM events to deal with interaction events. See abstract)(Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). See page 7, right col., 2nd paragraph)( XR technology facilitating cross-platform standard eliminating industry fragmentation by enabling applications to be written once to run on any XR system, and to access XR devices has been introduced and now people can easily access XR technology. See page 1, right col., first paragraph).

Regarding claim 6, Seo in view of You teaches the non-transitory computer readable medium of claim 1, wherein the video is previously recorded video, wherein the measurements from the orientation sensor are previously recorded measurements (Data is transmitted therefore, has to be previously recorded at some point in time, even when considering real-time transmission) (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph) (You; Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract).

Regarding claim 7, Seo teaches A non-transitory computer readable medium, storing instructions for executing a process for a device communicatively coupled to a tracking device ( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph) (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10), the instructions comprising: 
receiving video and from a mobile device (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph); 
transmitting hand information associated with a user hand from the another device, the hand information generated based on measurements obtained from the tracking device (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph); and 
overlaying a representation of the user hand on the video for display by the device based on the hand information, the representation of the user hand overlaid on the video in an orientation (Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)(See figure 10(a) and 10(b)), but is silent to receiving orientation sensor measurements and determined from the measurements from the orientation sensor , wherein overlaying the representation of the user hand on the video for display by the mobile device comprises, based on the measurements from the orientation sensor, moving the representation of the user hand on the video for display to match with movements of the user hand included in the hand information.
You teaches a technique in which image based tracking can be combined with inertial data to provide registration in and augmentation in an image (Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract).
Seo and You teach of presenting visual information to a user and You teaches that by combining image based tracking and inertial data tracking the system can provide a better registration of augmented data, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the system of Seo with the Augmentation and tracking techniques of 

Regarding claim 10, Seo in view of You teaches The non-transitory computer readable medium of claim 7, the instructions further comprising: establishing, through a web browser, a browser to browser connection from the device to another web browser of the mobile device; wherein the receiving the video and the orientation sensor measurements from the mobile device, and the transmitting hand information from the device is conducted through the browser to browser connection (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10) (Seo;An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph) (Seo; In this paper, we propose a webizing method for a collaborative interaction space that provides user authentication and manages user sessions in XR environments. In addition, the webizing method supports human interface devices and related events via an interaction adaptor that delivers events based on user session and converts event messages according to XR content types such as VRPN messages, X3D sensor data, and HTML DOM events to deal with interaction events. See abstract)(Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). See page 7, right col., 2nd paragraph)( XR technology facilitating cross-platform standard eliminating industry fragmentation by enabling applications to be written once to run on any XR system, and to access XR devices has been introduced and now people can easily access XR technology. See page 1, right col., first paragraph).

Regarding claim 12, Seo in view of You teaches the non-transitory computer readable medium of claim 7, wherein the video is previously recorded video, wherein the orientation sensor measurements are previously recorded orientation sensor measurements (Data is transmitted therefore, has to be previously recorded at some point in time, even when considering real-time transmission) (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph) (You; Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract).


Regarding claim 13, Seo teaches a non-transitory computer readable medium, storing instructions for a server (An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph), the instructions comprising: 
receiving a first connection from a mobile device (An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph); 
receiving a second connection from another device communicatively coupled to a tracking device (An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph); 
establishing a third connection between the mobile device and the another device to facilitate transmission of video from the mobile device to the another device and to facilitate transmission of hand information from the another device to the mobile device (An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph)( Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph), 
the hand information further comprising a color for displaying the hand on a display of the mobile device (See figure 5), 
but is silent to and orientation sensor measurements, wherein the hand information comprises movements of a user hand from the 5 DOCS 123144-0093UT01/4356976 5Application No. 16/678,440Docket No. 123144-0093UT01tracking device and, based on the orientation sensor measurements, moving a representation of the user hand on a video for display on the mobile device to match with movements of the user hand included in the hand information, 
You teaches a technique in which image based tracking can be combined with inertial data to provide registration in and augmentation in an image (Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract).
Seo and You teach of presenting visual information to a user and You teaches that by combining image based tracking and inertial data tracking the system can provide a better registration of augmented data, therefore, it would have been obvious to one of ordinary skill in the art before the 

Regarding claim 14, Seo in view of You teaches the non-transitory computer readable medium of claim 13, wherein the first connection and the second connection are received through a web browser, wherein the establishing the third connection comprises establishing a direct browser-to-browser connection between the mobile device and the another device (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10) (Seo ;An overview of the proposed system is shown in Figure 3, which demonstrates how to deal with multiple user interactions in a collaborative interaction space of an XR environment. Handling individual user’s interaction events and rendering XR content is processed in the XR web client, and to deliver these to other users, the event handler sends and receives JSON formatted interaction event data through the XR interaction space server. The webizing interaction handler deals with the other users’ received interaction data as if it was self-interaction data. The event handler also deals with content modification and synchronization events. The XR interaction space server manages the XR content cycle and delivers user interaction events among the users in the collaborative interaction space. see page 5, section 4.2, second paragraph) (Seo; In this paper, we propose a webizing method for a collaborative interaction space that provides user authentication and manages user sessions in XR environments. In addition, the webizing method supports human interface devices and related events via an interaction adaptor that delivers events based on user session and converts event messages according to XR content types such as VRPN messages, X3D sensor data, and HTML DOM events to deal with interaction events. See abstract)(Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). See page 7, right col., 2nd paragraph)( XR technology facilitating cross-platform standard eliminating industry fragmentation by enabling applications to be written once to run on any XR system, and to access XR devices has been introduced and now people can easily access XR technology. See page 1, right col., first paragraph).

Regarding claim 16, Seo in view of You teaches the non-transitory computer readable medium of claim 13, wherein the video is previously recorded video, wherein the orientation sensor measurements are previously recorded orientation sensor measurements (Data is transmitted therefore, has to be previously recorded at some point in time, even when considering real-time transmission) (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph) (You; Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract).

Regarding claim 18, Seo in view of You teaches The non-transitory computer readable medium of claim 1, further comprising: determining a gravity direction of a real-world environment represented in the video from the camera based on the measurements from the orientation sensor, the gravity direction of the real-world environment indicative of ground of the real-world environment; and aligning a gravity direction of the video from the camera to ground of a real-world environment represented in the video from the camera. (You; In this section, we analyze the error sensitivity of inertial tracker in an augmented reality tracking system. The inertial device we used for experiment is a three-degree of freedom (3DOF) orientation tracker produced by InterSense (Model IS-300). This device incorporates three orthogonal gyroscopes to sense angular rates of rotation along its three perpendicular axes. It also has sensors for the gravity vector and a compass [7] to compensate for gyro drift. See page 2, section 2.2 Error Sensitivity of Inertial AR Tracking System)(You; Our prototype hybrid tracker fuses inertial orientation (3DOF) data with vision feature tracking to stabilize performance and correct inertial drift. See Section 3. Hybrid Inertial-Vision Tracking)

Claim 2, 8, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo et al. (“Webizing Collaborative Interaction Space for Cross Reality with Various Human Interface Devices”, ACM, June, 2018.)(Hereinafter referred to as Seo) in view of You et al. (“Hybrid Inertial and Vision Tracking for Augmented Reality Registration”, IEEE. 2002)(Hereinafter referred to as You) in view of Melax et al. (“Dynamics Based 3D Skeletal Hand Tracking”, May, 2017.) .

Regarding claim 2. Seo in view of You teaches the non-transitory computer readable medium of claim 1, wherein the video is live video from the camera of the mobile device (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph),
the measurements are live measurements from the orientation sensor (You; Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract), but is silent to and the hand information comprises hand skeleton joint measurements and hand visualization parameters determined from live movements of the user hand.
Melax teaches a technique for full skeletal tracking of hands for placing for interaction with 3D physics based models (This paper presents a computationally-efficient, camera independent, scalable, physical-simulation-based approach for tracking 3D articulated skeletal models that is able to accurately track the human hand from a single depth sensor. Instead of using dynamics as an isolated step in the pipeline, such as the way an inverse kinematic solver would be applied only after placement of key features is somehow decided, our approach fits the hand to the depth data (or point cloud) by extending a physics system through adding additional constraints. Consequently, fitting the sensor data, avoiding interpenetrating fingers, preserving joint ranges, and exploiting temporal coherence and momentum are all constraints computed simultaneously in a unified solver. See page 1, right col., section 1, introduction last, paragraph)( Today, there already exists a wealth of interaction usages with 3D hand position data [22]. Rich interactive applications can be built merely by assembling physically enabled content [32]. Such physics scenes are ideal for adding hand interaction. With the skeletal pose information from a tracked hand, it is straightforward to use these to animate the bones of a 3D model such as the rig of a skinned hand model. Such a hand model can then use collision detection to interact with virtual objects [12, 21]. Without real-world or haptic feedback, when the bones are controlled directly in this manner it can result in a hand that is too strong. There is nothing preventing it push a heavy box at high speed or penetrating right through a wall. Therefore we use the relative bone orientations from connected bones in the tracking data to drive the “muscles” of a virtual hand in the application - often described as the “powered ragdoll” approach [6]. The interactive simulation system in the application applies forces and torques (up to realistic limits) in an attempt to have the virtual hand match the hand pose provided by the tracking. See section 6, page 6).
Seo in view of You and Melax teach of visualization of a hand and Melax teaches that by creating a 3D skeletal tracking model the hand can be skinned to interact with other 3D models based on a user’s movement, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Seo in view of You with the 3D skeletal hand 

Regarding claim 8, Seo in view of You teaches the non-transitory computer readable medium of claim 7, wherein the video is live video from mobile device (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph), 
the orientation sensor measurements are live orientation sensor measurements from the mobile device (You; Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract), but is silent to and the hand information comprises hand skeleton joint measurements and hand visualization parameters determined from live measurements of the user hand from the tracking device.
This paper presents a computationally-efficient, camera independent, scalable, physical-simulation-based approach for tracking 3D articulated skeletal models that is able to accurately track the human hand from a single depth sensor. Instead of using dynamics as an isolated step in the pipeline, such as the way an inverse kinematic solver would be applied only after placement of key features is somehow decided, our approach fits the hand to the depth data (or point cloud) by extending a physics system through adding additional constraints. Consequently, fitting the sensor data, avoiding interpenetrating fingers, preserving joint ranges, and exploiting temporal coherence and momentum are all constraints computed simultaneously in a unified solver. See page 1, right col., section 1, introduction last, paragraph)( Today, there already exists a wealth of interaction usages with 3D hand position data [22]. Rich interactive applications can be built merely by assembling physically enabled content [32]. Such physics scenes are ideal for adding hand interaction. With the skeletal pose information from a tracked hand, it is straightforward to use these to animate the bones of a 3D model such as the rig of a skinned hand model. Such a hand model can then use collision detection to interact with virtual objects [12, 21]. Without real-world or haptic feedback, when the bones are controlled directly in this manner it can result in a hand that is too strong. There is nothing preventing it push a heavy box at high speed or penetrating right through a wall. Therefore we use the relative bone orientations from connected bones in the tracking data to drive the “muscles” of a virtual hand in the application - often described as the “powered ragdoll” approach [6]. The interactive simulation system in the application applies forces and torques (up to realistic limits) in an attempt to have the virtual hand match the hand pose provided by the tracking. See section 6, page 6).
Seo in view of You and Melax teach of visualization of a hand and Melax teaches that by creating a 3D skeletal tracking model the hand can be skinned to interact with other 3D models based on a user’s movement, therefore, it would have been obvious to one of ordinary skill in the art before the effective 

Regarding claim 15, Seo in view of You teaches the non-transitory computer readable medium of claim 13, wherein the video is live video from mobile device (Seo; Figure 10: Remote collaboration in an XR environment: (a) the worker fixes a chamber in AR mode and (b) the expert gives instructions to the worker using hand gesturing via remote video on a web browser See caption for figure 10)( Seo; Figure 10 is an example of remote collaboration in an XR environment. Figure 10(a) shows the repair process of a real chamber in an AR environment. The worker uses a web based AR browser to check the manual, as exhibited in Figure 8. The URI of the collaborative XR space is shared with the expert who can view a remote video stream of the worker's AR browser in real time, as shown in Figure 10(b). The expert uses an interaction device for hand gestures to instruct using hand gesturing within the collaboration space, as shown in Figure 9, and the instructions are shared and processed using the webizing interaction handler. See page 7, right col., second paragraph), 
the orientation sensor measurements are live orientation sensor measurements from the mobile device (You; Vision-based systems can use passive landmarks, but they are more computationally demanding and often exhibit erroneous behavior due to occlusion or numerical instability. Inertial sensors are completely passive, requiring no external devices or targets, however, the drift rates in portable strapdown configurations are too great for practical use. In this paper, we present a hybrid approach to AR tracking that integrates inertial and vision-based technologies. We exploit the complementary nature of the two technologies to compensate for the weaknesses in each component. Analysis and experimental results demonstrate this system's effectiveness. See Abstract), but is silent to and the hand information comprises hand skeleton joint measurements and hand visualization parameters determined from live measurements of the user hand from the tracking device.
Melax teaches a technique for full skeletal tracking of hands for placing for interaction with 3D physics based models (This paper presents a computationally-efficient, camera independent, scalable, physical-simulation-based approach for tracking 3D articulated skeletal models that is able to accurately track the human hand from a single depth sensor. Instead of using dynamics as an isolated step in the pipeline, such as the way an inverse kinematic solver would be applied only after placement of key features is somehow decided, our approach fits the hand to the depth data (or point cloud) by extending a physics system through adding additional constraints. Consequently, fitting the sensor data, avoiding interpenetrating fingers, preserving joint ranges, and exploiting temporal coherence and momentum are all constraints computed simultaneously in a unified solver. See page 1, right col., section 1, introduction last, paragraph)( Today, there already exists a wealth of interaction usages with 3D hand position data [22]. Rich interactive applications can be built merely by assembling physically enabled content [32]. Such physics scenes are ideal for adding hand interaction. With the skeletal pose information from a tracked hand, it is straightforward to use these to animate the bones of a 3D model such as the rig of a skinned hand model. Such a hand model can then use collision detection to interact with virtual objects [12, 21]. Without real-world or haptic feedback, when the bones are controlled directly in this manner it can result in a hand that is too strong. There is nothing preventing it push a heavy box at high speed or penetrating right through a wall. Therefore we use the relative bone orientations from connected bones in the tracking data to drive the “muscles” of a virtual hand in the application - often described as the “powered ragdoll” approach [6]. The interactive simulation system in the application applies forces and torques (up to realistic limits) in an attempt to have the virtual hand match the hand pose provided by the tracking. See section 6, page 6).
.

Claim 3, 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo et al. (“Webizing Collaborative Interaction Space for Cross Reality with Various Human Interface Devices”, ACM, June, 2018.)(Hereinafter referred to as Seo) in view of You et al. (“Hybrid Inertial and Vision Tracking for Augmented Reality Registration”, IEEE. 2002)(Hereinafter referred to as You) in view of Melax et al. (“Dynamics Based 3D Skeletal Hand Tracking”, May, 2017.) in view of Groh et al. (“Aughanded Virtuality – The Hands in the Virtual Environment”, IEEE, 2015)(Hereinafter referred to as Groh).

Regarding claim 3, Seo in view of You in view of Melax teaches the non-transitory computer readable medium of claim 2, wherein the overlaying the representation of the user hand on the video for display by the mobile device based on the hand information comprises generating a 3D hand model of the user hand as the representation based on the hand skeleton joint measurements (Melax; This paper presents a computationally-efficient, camera independent, scalable, physical-simulation-based approach for tracking 3D articulated skeletal models that is able to accurately track the human hand from a single depth sensor. Instead of using dynamics as an isolated step in the pipeline, such as the way an inverse kinematic solver would be applied only after placement of key features is somehow decided, our approach fits the hand to the depth data (or point cloud) by extending a physics system through adding additional constraints. Consequently, fitting the sensor data, avoiding interpenetrating fingers, preserving joint ranges, and exploiting temporal coherence and momentum are all constraints computed simultaneously in a unified solver. See page 1, right col., section 1, introduction last, paragraph)(Melax; Today, there already exists a wealth of interaction usages with 3D hand position data [22]. Rich interactive applications can be built merely by assembling physically enabled content [32]. Such physics scenes are ideal for adding hand interaction. With the skeletal pose information from a tracked hand, it is straightforward to use these to animate the bones of a 3D model such as the rig of a skinned hand model. Such a hand model can then use collision detection to interact with virtual objects [12, 21]. Without real-world or haptic feedback, when the bones are controlled directly in this manner it can result in a hand that is too strong. There is nothing preventing it push a heavy box at high speed or penetrating right through a wall. Therefore we use the relative bone orientations from connected bones in the tracking data to drive the “muscles” of a virtual hand in the application - often described as the “powered ragdoll” approach [6]. The interactive simulation system in the application applies forces and torques (up to realistic limits) in an attempt to have the virtual hand match the hand pose provided by the tracking. See section 6, page 6), but is silent to and adjusting one or more of a color and a size of the 3D model as overlaid on the video based on one or more of hue information of the video and detected objects on the video.
Groh teaches a technique in which the user can modify various parameters of the overlay including Hue (We developed a toolbox which can be used to adjust the visual appearance of the video overlay. Thus, in contrast to related work the Aughanded Virtuality application provides functionalities to manipulate the user’s limbs beyond realistic concepts. It is possible to change the size, position and orientation of the video overlay. In addition, we implemented options to adjust the hue, saturation, brightness and transparency values of the final videostream pixels. See page 158, section 5 Toolbox and Figure 2).


Regarding claim 9, Seo in view of You in view of Melax teaches The non-transitory computer readable medium of claim 8, wherein the overlaying the representation of the user hand on the video for display by the device based on the hand information comprises generating a 3D hand model of the user hand as the representation based on the hand skeleton joint measurements (Melax; This paper presents a computationally-efficient, camera independent, scalable, physical-simulation-based approach for tracking 3D articulated skeletal models that is able to accurately track the human hand from a single depth sensor. Instead of using dynamics as an isolated step in the pipeline, such as the way an inverse kinematic solver would be applied only after placement of key features is somehow decided, our approach fits the hand to the depth data (or point cloud) by extending a physics system through adding additional constraints. Consequently, fitting the sensor data, avoiding interpenetrating fingers, preserving joint ranges, and exploiting temporal coherence and momentum are all constraints computed simultaneously in a unified solver. See page 1, right col., section 1, introduction last, paragraph)(Melax; Today, there already exists a wealth of interaction usages with 3D hand position data [22]. Rich interactive applications can be built merely by assembling physically enabled content [32]. Such physics scenes are ideal for adding hand interaction. With the skeletal pose information from a tracked hand, it is straightforward to use these to animate the bones of a 3D model such as the rig of a skinned hand model. Such a hand model can then use collision detection to interact with virtual objects [12, 21]. Without real-world or haptic feedback, when the bones are controlled directly in this manner it can result in a hand that is too strong. There is nothing preventing it push a heavy box at high speed or penetrating right through a wall. Therefore we use the relative bone orientations from connected bones in the tracking data to drive the “muscles” of a virtual hand in the application - often described as the “powered ragdoll” approach [6]. The interactive simulation system in the application applies forces and torques (up to realistic limits) in an attempt to have the virtual hand match the hand pose provided by the tracking. See section 6, page 6), but is silent to and adjusting one or more of a color and a size of the 3D model as overlaid on the video based on one or more of hue information of the video and detected objects on the video.
Groh teaches a technique in which the user can modify various parameters of the overlay including Hue (We developed a toolbox which can be used to adjust the visual appearance of the video overlay. Thus, in contrast to related work the Aughanded Virtuality application provides functionalities to manipulate the user’s limbs beyond realistic concepts. It is possible to change the size, position and orientation of the video overlay. In addition, we implemented options to adjust the hue, saturation, brightness and transparency values of the final videostream pixels. See page 158, section 5 Toolbox and Figure 2).
Seo in view of You in view of Melax teach of presenting hand information in a virtual environment and Groh teaches that various hand parameters can be modified to adjust the visual appearance of the overlay, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Seo in view of You in view of Melax with the overlay parameter adjustment technique of Groh such that the user could configure the overlay based on the specific application and parameters of the scene.


s 5, 11 are rejected under 35 U.S.C. 103 as being unpatentable over Seo et al. (“Webizing Collaborative Interaction Space for Cross Reality with Various Human Interface Devices”, ACM, June, 2018.)(Hereinafter referred to as Seo) in view of You et al. (“Hybrid Inertial and Vision Tracking for Augmented Reality Registration”, IEEE. 2002)(Hereinafter referred to as You) in view of Chen et al. (“SEMarbeta: Mobile Sketch-Gesture-Video Remote Support for Car Drivers”, ACM, 2013.)(Hereinafter referred to as Chen)
Regarding claim 5, Seo in view of You teaches the non-transitory computer readable medium of claim 1, but is silent to wherein the instructions further comprises: transmitting audio recorded from a microphone to the another device; and outputting audio received from the another device 
Chen teaches a system of transmitting both video data which includes audio from a driver to a helper (Chen; The SEMarbeta condition was carried out in a wireless local area network. For the study, the driver subject worked on a Volvo V70 car while holding a Samsung Galaxy Tab 10.1 tablet PC. The SEMarbeta application for Android ran on the tablet and allowed the driver to start a video-call, transmit duplex painting information, and receive gestural information. See section 6.1 SEMarbeta vs. voice-only condition)(See figure 7).
Seo in view of You and Chen teach of remote assistance with visual cues and Chen teaches that the person needing assistance can provide audio, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Seo in view of You with the audio assistance technique of Chen such that the user could verbally ask for assistance form the expert.


The non-transitory computer readable medium of claim 7, but is silent to wherein the instructions further comprises: transmitting audio recorded from a microphone to the mobile device; and outputting audio received from the mobile device 
Chen teaches a system of transmitting both video data which includes audio from a driver to a helper (Chen; The SEMarbeta condition was carried out in a wireless local area network. For the study, the driver subject worked on a Volvo V70 car while holding a Samsung Galaxy Tab 10.1 tablet PC. The SEMarbeta application for Android ran on the tablet and allowed the driver to start a video-call, transmit duplex painting information, and receive gestural information. See section 6.1 SEMarbeta vs. voice-only condition)(See figure 7).
Seo in view of You and Chen teach of remote assistance with visual cues and Chen teaches that the person needing assistance can provide audio, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Seo in view of You with the audio assistance technique of Chen such that the user could verbally ask for assistance form the expert.

Claim 19  is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo et al. (“Webizing Collaborative Interaction Space for Cross Reality with Various Human Interface Devices”, ACM, June, 2018.)(Hereinafter referred to as Seo) in view of You et al. (“Hybrid Inertial and Vision Tracking for Augmented Reality Registration”, IEEE. 2002)(Hereinafter referred to as You) in view of Ko et al. (“Introduction of Physics Simulation in Augmented Reality”, IEEE, 2008.)(Hereinafter referred to as Ko).

Regarding claim 19, Seo in view of You teaches The non-transitory computer readable medium of claim 18, but is silent to further comprising: matching a vertical direction of the hand information associated with the user hand to the gravity direction of the video.
The first example shows a ball augmented onto a single marker. The ball has weight, which falls down due to gravity, collides with the floor and bounces off. Before the physical attributes are applied, the ball would be floating in the air, which is not realistic. However, by adding the physical attributes, a realistic behavior can be observed as demonstrated in Fig. 5. The second example shows that a box falls toward the floor and changes its motion depending on the angle of the floor as given in Fig. 6. See section 4, Example Result and Application).
Seo in view of You and Ko teach of virtual visual representations with mobile devices and Ko teaches by providing physics simulations the objects are presented in a more realistic manner, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Seo in view of You with the physics based integration techniques of Ko such that the virtual visualizations could be simulated based on physical environment data.



Allowable Subject Matter
Claims 17 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art of record alone or in combination is silent to the limitations “wherein adjusting a size of the 3D model as overlaid on the video is based on estimating a size of detected objects in vicinity of a portion of the 3D model as overlaid on the video.” of claim 17 when read in light of the rest of the limitations in claim 17 and the claims to which claim 17 depends and thus claim 17 contains allowable subject matter.
wherein the instructions further comprise: adjusting a color of the representation of the user hand as overlaid on the video based on calculating an average hue value in the video proximal to the representation of the user hand as overlaid on the video and determining a complementary hue value from the average hue value, the color corresponding to the complementary hue value. ” of claim 20 when read in light of the rest of the limitations in claim 20 and the claims to which claim 20 depends and thus claim 20 contains allowable subject matter.




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (572)-272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/NICHOLAS R WILSON/Primary Examiner, Art Unit 2611